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5 

BACKGROUND OF THE INVENTION 

This application claims the benefit of the earlier filing date of a United States 
60/024,050 

provisional patent application serial number . < filed on August 16, 1 996 entitled 

"Long Wavelength Mutant Fluorescent Proteins" and patent application serial number 
10 08/706 ,408filed on August 30, 1996 entitled "Long Wavelength Engineered Fluorescent 
Proteins," both of which are herein incorporated by reference. 

This invention was made in part with Government support under grant no. 
MCB 941 8479 awarded by the National Science Foundation. The Government may have 
rights in this invention. 

1 5 Fluorescent molecules are attractive as reporter molecules in many assay 

systems because of their high sensitivity and ease of quantification. Recently, fluorescent 
proteins have been the focus of much attention because they can be produced in vivo by 
biological systems, and can be used to trace intracellular events without the need to be 
introduced into the cell through microinjection or permeabilization. The green fluorescent 

2 0 protein of Aequorea victoria is particularly interesting as a fluorescent protein. A cDNA for 
the protein has been cloned. (D.C. Prasher et al., "Primary structure of the Aequorea 
victoria green-fluorescent protein," Gene (1992) 1 1 1:229-33.) Not only can the primary 
amino acid sequence of the protein be expressed from the cDNA, but the expressed protein * 
can fluoresce. This indicates that the protein can undergo the cyclization and oxidation 

2 5 believed to be necessary for fluorescence. Aequorea green fluorescent protein 

("GFP") is a stable, proteolysis-resistant single chain of 238 residues and has two absorption 
maxima at around 395 and 475 nm. The relative amplitudes of these two peaks is sensitive 
to environmental factors (W. W. Ward. Biohcminescence and Chemiluminescence (M. A. 
DeLuca and W. D. McElroy, eds) Academic Press pp. 235-242 (1981); W. W. Ward & S. 

30 H. Bokman Biochemistry 21 :4535-4540 (1982); W. W. Ward et al. Photochem. Photobiol. 
35:803-808 (1982)) and illumination history (A. B. Cubitt et al. Trends Biochem. Sci. 
20:448-455 (1 995)), presumably reflecting two or more ground states. Excitation at the 
primary absorption peak of 395 nm yields an emission maximum at 508 nm with a quantum 
yield of 0.72-0.85 (O. Shimomura and F.H. Johnson J. Ceil. Comp. Physiol. 59:223 (1962); 
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J. G. Morin and J. W. Hastings, J. Cell. Physiol. 77:313 (1971); H. Morise et al. 
Biochemistry 13:2656 (1974); W. W. Ward Phoiockem. Photobiol. Reviews (Smith, K. C. 
ed.) 4:1 (1979); A. B. Cubitt et al. Trends Biochem. Sci. 20:448-455 (1995); D. C. Prasher 
Trends Genet. 1 1:320-323 (1995); M. Chalfie Photochem. Photobiol. 62:651-656 (1995); 
W. W. Ward. Bioluminescence and Chemiluminescence (M. A. DeLuca and W. D. 
McElroy, eds) Academic Press pp. 235-242 (1981); W. W. Ward & S. H. Bokman 
Biochemistry 21:4535-4540 (1982); W. W. Ward et al. Photochem. Photobiol. 35:803-808 
(1982)). The fluorophore results from the autocatalytic cyclization of the polypeptide 
backbone between residues Ser* J and Gly 67 and oxidation of the D-6 bond of Tyr 66 (A. B. 
Cubitt et al. Trends Biochem. Sci. 20:448-455 (1995); C. W. Cody et al. Biochemistry 
32:1212-1218 (1993); R. Heim et al. Proc. Natl. Acad. Sci. USA 91:12501-12504 (1994)). 
Mutation of Ser 65 to Thr (S65T) simplifies the excitation spectrum to a single peak at 488 
nm of enhanced amplitude (R. Heim et al. Nature 373:664-665 (1995)), which no longer 
gives signs of conformational isomers (A. B. Cubitt et al. Trends Biochem. Sci. 20:448-455 
(1995)). 

Fluorescent proteins have been used as markers of gene expression, tracers of 
cell lineage and as fusion tags to monitor protein localization within living cells. (M. 
Chalfie et al., "Green fluorescent protein as a marker for gene expression," Science 263:802- 
805; A.B. Cubitt et al., "Understanding, improving and using green fluorescent proteins," 
TIBS 20, November 1995, pp. 448-455. U.S. patent 5,491,084, M. Chalfie and D. Prasher. 
Furthermore, engineered versions of Aequorea green fluorescent protein have been ' 
identified that exhibit altered fluorescence characteristics, including altered excitation and 
emission maxima, as well as excitation and emission spectra of different shapes. (R. Heim 
et al., "Wavelength mutations and posttranslational autoxidation of green fluorescent 
protein," Proc. Natl. Acad. Sci. USA, (1994) 91:12501-04; R. Heim et al., "Improved green 
fluorescence," Nature (1995) 373:663-665.) These properties add variety and utility to the 
arsenal of biologically based fluorescent indicators. 

There is a need for engineered fluorescent proteins with varied fluorescent 

properties. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figs. 1A-1B. (A) Schematic drawing of the backbone of GFP produced by 
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Molscript (J.P. Kraulis, J. Appl. Cryst., 24:946 (1991)). The chromophore is shown as a 
ball and stick model. (B) Schematic drawing of the overall fold of GFP. Approximate 
residue numbers mark the beginning and ending of the secondary structure elements. 

Figs. 2A-2C. (A) Stereo drawing of the chromophore and residues in the 
5 immediate vicinity. Carbon atoms are drawn as open circles, oxygen is filled and nitrogen is 
shaded. Solvent molecules are shown as isolated filled circles. (B) Portion of the final 2F 0 - 
F c electron density map contoured at 1.0 □, showing the electron density surrounding the 
chromophore. (C) Schematic diagram showing the first and second spheres of coordination 
of the chromophore. Hydrogen bonds are shown as dashed lines and have the indicated 
10 lengths in A. Inset: proposed structure of the carbinolamine intermediate that is presumably 
formed during generation of the chromophore. 

Fig. 3 depicts the nucleotide sequence (SEQ ED NO:l) and deduced amino 
acid sequence (SEQ ED NO:2) of an Aequorea green fluorescent protein. 

Fig. 4 depicts the nucleotide sequence (SEQ ID NO:3) and deduced arnino 
1 5 acid sequence (SEQ ED NO:4) of the engineered Aeqiiorea-relzted fluorescent protein 

S65G/S72A/T203Y utilizing preferred mammalian codons and optimal Kozak sequence. 

Figs. 5-1 to 5-28 present the coordinates for the crystal structure of 
Aequorea-reteted green fluorescent protein S65T. 

Fig. 6 shows the fluorescence excitation and emission spectra for engineered 
2 0 fluorescent proteins 20A and 1 0C (Table F). The vertical line at 528 nm compares the 

emission maxima of 10C, to the left of the line, and 20A, to the right of the line. ♦ 



SUMMARY OF THE INVENTION 

This invention provides functional engineered fluorescent proteins with 

2 5 varied fluorescence characteristics that can be easily distinguished from currently existing 

green and blue fluorescent proteins. Such engineered fluorescent proteins enable the 
simultaneous measurement of two or more processes within cells and can be used as 
fluorescence energy donors or acceptors when used to monitor protein-protein interactions 
through FRET. Longer wavelength engineered fluorescent proteins are particularly useful 

3 0 because photodynamic toxicity and auto-fluorescence of cells are significantly reduced at 

longer wavelengths. In particular, the introduction of the substitution T203X, wherein X is 
an aromatic amino acid, results in an increase in the excitation and emission wavelength 
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maxima of Aequorea-related fluorescent proteins. 

In one aspect, this invention provides a nucleic acid molecule comprising a 
nucleotide sequence encoding a functional engineered fluorescent protein whose amino acid 
sequence is substantially identical to the amino acid sequence oiAequorea green fluorescent 
protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least an amino acid 
substitution located no more than about 0.5 run from the chromophore of the engineered 
fluorescent protein, wherein the substitution alters the electronic environment of the 
chromophore, whereby the functional engineered fluorescent protein has a different 
fluorescent property than Aequorea green fluorescent protein. 

In one aspect this invention provides a nucleic acid molecule comprising a 
nucleotide sequence encoding a functional engineered fluorescent protein whose amino acid 
sequence is substantially identical to the amino acid sequence of Aequorea green fluorescent 
protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least a substitution at 
T203 and, in particular, T203X, wherein X is an aromatic amino acid selected from H, Y, W 
or F, said functional engineered fluorescent protein having a different fluorescent property 
than Aequorea green fluorescent protein. In one embodiment, the amino acid sequence 
further comprises a substitution at S65, wherein the substitution is selected from S65G, 
S65T, S65A, S65L, S65C, S65V and S65I. In another embodiment, the amino acid 
sequence differs by no more than the substitutions S65T/T203H; S65T/T203Y; 
S72A/F64L/S65G/T203Y; S65G/V68L/Q69K/S72A/T203Y; S72A7S65G/V68L/T203Y; 
S65G/S72A/T203Y; or S65G/S72A/T203W. In another embodiment, the amino acid * 
sequence further comprises a substitution at Y66, wherein the substitution is selected from 
Y66H, Y66F, and Y66W. In another embodiment, the amino acid sequence further 
comprises a mutation from Table A. In another embodiment, the amino acid sequence 
further comprises a folding mutation. In another embodiment, the nucleotide sequence 
encoding the protein differs from the nucleotide sequence of SEQ ID NO:l by the 
substitution of at least one codon by a preferred mammalian codon. In another embodiment, 
the nucleic acid molecule encodes a fusion protein wherein the fusion protein comprises a 
polypeptide of interest and the functional engineered fluorescent protein. 

In another aspect, this invention provides a nucleic acid molecule comprising 
a nucleotide sequence encoding a functional engineered fluorescent protein whose amino 
acid sequence is substantially identical to the amino acid sequence of Aequorea green 
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fluorescent protein (SEQ ID NO:2) and which differs from SEQ ED NO:2 by at least an 
amino acid substitution at L42, V61, T62, V68, Q69, Q94, N121, Y145, H148, V150, F165, 
1167, Q183, N185, L220, E222 (not E222G), or V224, said functional engineered 
fluorescent protein having a different fluorescent property than Aequorea green fluorescent 
protein. In one embodiment, amino acid substitution is: 

L42X, wherein X is selected from C, F, H, W and Y, 

V61X, wherein X is selected from F, Y, H and C, 

T62X, wherein X is selected from A, V, F, S, D, N, Q, Y, H and C, 

V68X, wherein X is selected from F, Y and H, 

Q69X, wherein X is selected from K, R, E and G, 

Q94X, wherein X is selected from D, E, H, K and N, 

N121X, wherein X is selected from F, H, W and Y, 

Y145X, wherein X is selected from W, C, F, L, E, H, K and Q, 

H148X, wherein X is selected from F, Y, N, K, Q and R, 

VI SOX, wherein X is selected from F, Y and H, 

F165X, wherein X is selected from H, Q, W and Y, 

I167X, wherein X is selected from F, Y and H, 

Q183X, wherein X is selected from H, Y, E and K, 

Nl 85X, wherein X is selected from D, E, H, K and Q, 

L220X, wherein X is selected from H, N, Q and T, 

E222X, wherein X is selected from N and Q, or ♦ 
V224X, wherein X is selected from H, N, Q, T, F, W and Y. 

In a further aspect, this invention provides an expression vector comprising 
expression control sequences operatively linked to any of the aforementioned nucleic acid 
molecules. In a further aspect, this invention provides a recombinant host cell comprising 
the aforementioned expression vector. 

In another aspect, this invention provides a functional engineered fluorescent 
protein whose amino acid sequence is substantially identical to the amino acid sequence of 
Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ED NO:2 
by at least an amino acid substitution located no more than about 0.5 nm from the 
chromophore of the engineered fluorescent protein, wherein the substitution alters the 
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electronic environment of the chromophore, whereby the functional engineered fluorescent 
protein has a different fluorescent property than Aequorea green fluorescent protein. 

In another aspect, this invention provides a functional engineered fluorescent 
protein whose amino acid sequence is substantially identical to the amino acid sequence of 
5 Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 
by at least the amino acid substitution at T203, and in particular, T203X, wherein X is an 
aromatic amino acid selected from H, Y, W or F, said functional engineered fluorescent 
protein having a different fluorescent property than Aequorea green fluorescent protein. In 
one embodiment, the amino acid sequence further comprises a substitution at S65, wherein 

10 the substitution is selected from S65G, S65T, S65A, S65L, S65C, S65V and S65I. In 
another embodiment, the amino acid sequence differs by no more than the substitutions 
S65T/T203H; S65T/T203Y; S72A/F64L/S65G/T203Y; S72A/S65G/V68L7T203Y; 
S65G/V68L/Q69K7S72A/T203Y; S6SG/S72A/T203Y; or S65G/S72A/T203 W. In another 
embodiment, the amino acid sequence further comprises a substitution at Y66, wherein the 

1 5 substitution is selected from Y66K, Y66F, and Y66W. In another embodiment, the amino 
acid sequence further comprises a folding mutation. In another embodiment, the engineered 
fluorescent protein is part of a fusion protein wherein the fusion protein comprises a 
polypeptide of interest and the functional engineered fluorescent protein. 

In another aspect this invention provides a functional engineered fluorescent 

2 0 protein whose amino acid sequence is substantially identical to the amino acid sequence of 

Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 
by at least an amino acid substitution at L42, V61, T62, V68, Q69, Q94, N121, Y145, 
H148, V150, F165, 1167, Q183, N185, L220, E222, or V224, said functional engineered 
fluorescent protein having a different fluorescent property than Aequorea green fluorescent 
25 protein. 

In another aspect, this invention provides a fluorescently labelled antibody 
comprising an antibody coupled to any of the aforementioned functional engineered 
fluorescent proteins. In one embodiment, the fluorescently labelled antibody is a fusion 
protein wherein the fusion protein comprises the antibody fused to the functional engineered 

3 0 fluorescent protein. 

In another aspect, this invention provides a nucleic acid molecule comprising 
a nucleotide sequence encoding an antibody fused to a nucleotide sequence encoding a 
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functional engineered fluorescent protein of this invention. 

In another aspect, this invention provides a fluorescently labelled nucleic 
acid probe comprising a nucleic acid probe coupled to a functional engineered fluorescent 
protein whose amino acid sequence of this invention. The fusion can be through a linker 
5 peptide. 

In another aspect, this invention provides a method for determining whether 
a mixture contains a target comprising contacting the mixture with a fluorescently labelled 
probe comprising a probe and a functional engineered fluorescent protein of this invention; 
and determining whether the target has bound to the probe. In one embodiment, the target 

1 0 molecule is captured on a solid matrix. 

In another aspect, this invention provides a method for engineering a 
functional engineered fluorescent protein having a fluorescent property different than 
Aequorea green fluorescent protein, comprising substituting an amino acid that is located no 
more than 0.5 nm from any atom in the chromophore of an Aequorea-related green 

1 5 fluorescent protein with another amino acid; whereby the substitution alters a fluorescent 
property of the protein. In one embodiment, the amino acid substitution alters the electronic 
environment of the chromophore. 

In another aspect, this invention provides a method for engineering a 
functional engineered fluorescent protein having a different fluorescent property than 

20 Aequorea green fluorescent protein comprising substituting amino acids in a loop domain of 
an Aequorea -related green fluorescent protein with amino acids so as to create a consensus* 
sequence for phosphorylation or for proteolysis. 

In another aspect, this invention provides a method for producing 
fluorescence resonance energy transfer comprising providing a donor molecule comprising 

2 5 a functional engineered fluorescent protein this invention; providing an appropriate acceptor 
molecule for the fluorescent protein; and bringing the donor, molecule and the acceptor 
molecule into sufficiently close contact to allow fluorescence resonance energy transfer. 

In another aspect, this invention provides a method for producing 
fluorescence resonance energy transfer comprising providing an acceptor molecule 

30 comprising a functional engineered fluorescent protein of this invention; providing an 

appropriate donor molecule for the fluorescent protein; and bringing the donor molecule and 
the acceptor molecule into sufficiently close contact to aliow fluorescence resonance energy 
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transfer. In one embodiment, the donor molecule is a engineered fluorescent protein whose 
amino acid sequence comprises the substitution T203I and the acceptor molecule is an 
engineered fluorescent protein whose amino acid sequence comprises the substitution 
T203X. wherein X is an aromatic amino acid selected from H, Y, W or F, said functional 
engineered fluorescent protein having a different fluorescent property than Aequorea green 
fluorescent protein. 

In another aspect, this invention provides a crystal of a protein comprising a 
fluorescent protein with an amino acid sequence substantially identical to SEQ ID NO: 2, 
wherein said crystal diffracts with at least a 2.0 to 3.0 angstrom resolution. 

In another embodiment, this invention provides computational method of 
designing a fiuoresent protein comprising determining from a three dimensional model of a 
crystallized fluorescent protein comprising a fluorescent protein with a bound ligand, at 
least one interacting amino acid of the fluorescent protein that interacts with at least one 
first chemical moiety of the ligand, and selecting at least one chemical modification of the 
first chemical moiety to produce a second chemical moiety with a structure to either 
decrease or increase an interaction between the interacting amino acid and the second 
chemical moiety compared to the interaction between the interacting amino acid and the 
first chemical moiety. 

In another embodiment, this invention provides a computational method of 
modeling the three dimensional structure of a fluorescent protein comprising determining a 
three dimensional relationship between at least two atoms listed in the atomic coordinates of* 
Figs. 5-1 to 5-28. 

In another embodiment, this invention provides a device comprising a 
storage device and, stored in the device, at least 10 atomic coordinates selected from the 
atomic coordinates listed in Figs. 5-1 to 5-28. In one embodiment, the storage device is a 
computer readable device that stores code that receives as input the atomic coordinates. In 
another embodiment, the computer readable device is a floppy disk or a hard drive. 

DETAILED DESCRIPTION OF THE INVENTION 

I. DEFINITIONS 

Unless defined otherwise, all technical and scientific terms used herein have 
the same meaning as commonly understood by those of ordinary skill in the art to which 
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this invention belongs. Although any methods and materials similar or equivalent to those 
described herein can be used in the practice or testing of the present invention, the preferred 
methods and materials are described. For purposes of the present invention, the following 
terms are defined below. 
5 "Binding pair" refers to two moieties (e.g. chemical or biochemical) that have an 

affinity for one another. Examples of binding pairs include antigen/antibodies, 
lectin/avidin, target polynucleotide/probe oligonucleotide, antibody/anti-antibody, 
receptor/ligand, enzyme/ligand and the like. "One member of a binding pair" refers to one 
moiety of the pair, such as an antigen or ligand. 

1 o "Nucleic acid" refers to a deoxyribonucleotide or ribonucleotide polymer in either 

single- or double-stranded form, and, unless otherwise limited, encompasses known analogs 
of natural nucleotides that can function in a similar manner as naturally occurring 
nucleotides. It will be understood that when a nucleic acid molecule is represented by a 
DNA sequence, this also includes RNA molecules having the corresponding RNA sequence 
15 in which "U" replaces "T." 

"Recombinant nucleic acid molecule" refers to a nucleic acid molecule which 
is not naturally occurring, and which comprises two nucleotide sequences which are not 
naturally joined together. Recombinant nucieic acid molecules are produced by artificial 
recombination, e.g., genetic engineering techniques or chemical synthesis. 

2 0 Reference to a nucleotide sequence "encoding" a polypeptide means that the 

sequence, upon transcription and translation of mRNA, produces the polypeptide. This * 
includes both the coding strand, whose nucleotide sequence is identical to mRNA and 
whose sequence is usually provided in the sequence listing, as well as its complementary 
strand, which is used as the template for transcription. As any person skilled in the art 
2 5 recognizes, this also includes all degenerate nucleotide sequences encoding the same amino 
acid sequence. Nucleotide sequences encoding a polypeptide include sequences containing 
introns. 

"Expression control sequences" refers to nucleotide sequences that regulate 
the expression of a nucleotide sequence to which they are operatively linked. Expression 
30 control sequences are "operatively linked" to a nucleotide sequence when the expression 
control sequences control and regulate the transcription and, as appropriate, translation of 
the nucleotide sequence. Thus, expression control sequences can include appropriate 
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promoters, enhancers, transcription terminators, a start codon (i.e., ATG) in front of a 
protein-encoding gene, splicing signals for introns, maintenance of the correct reading frame 
of that gene to permit proper translation of the rnRNA, and stop codons. 

"Naturally-occurring" as used herein, as applied to an object, refers to the fact that an 
object can be found in nature. For example, a polypeptide or polynucleotide sequence that 
is present in an organism (including viruses) that can be isolated from a source in nature and 
which has not been intentionally modified by man in the laboratory is naturally-occurring. 

"Operably linked" refers to a juxtaposition wherein the components so described are 
in a relationship permitting them to function in their intended manner. A control sequence 
"operably linked" to a coding sequence is ligated in such a way that expression of the 
coding sequence is achieved under conditions compatible with the control sequences, such 
as when the appropriate molecules (e.g., inducers and polymerases) are bound to the control 
or regulatory sequence(s). 

"Control sequence" refers to polynucleotide sequences which are necessary to effect 
the expression of coding and non-coding sequences to which they are ligated. The nature of 
such control sequences differs depending upon the host organism; in prokaryotes, such 
control sequences generally include promoter, ribosomal binding site, and transcription 
termination sequence; in sukaryotes, generally, such control sequences include promoters 
and transcription termination sequence. The term "control sequences" is intended to 
include, at a minimum, components whose presence can influence expression, and can also 
include additional components whose presence is advantageous, for example, leader 
sequences and fusion partner sequences. 

"Isolated polynucleotide" refers a polynucleotide of genomic, cDNA, or synthetic 
origin or some combination there of, which by virtue of its origin the "isolated 
polynucleotide" (1) is not associated with the cell in which the "isolated polynucleotide" is 
found in nature, or (2) is operably linked to a polynucleotide which it is not linked to in 
nature. 

"Polynucleotide" refers to a polymeric form of nucleotides of at least 10 bases in 
length, either ribonucleotides or deoxynucleotides or a modified form of either type of 
nucleotide. The term includes single and double stranded forms of DNA. 

The term "probe" refers to a substance that specifically binds to another 
substance (a "target"). Probes include, for example, antibodies, nucleic acids, receptors and 
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their ligands. 

"Modulation" refers to the capacity to either enhance or inhibit a functional property 
of biological activity or process (e.g., enzyme activity or receptor binding); such 
enhancement or inhibition may be contingent on the occurrence of a specific event, such as 
5 activation of a signal transduction pathway, and/or may be manifest only in particular cell 
types. 

The term "modulator" refers to a chemical (naturally occurring or non-naturally 
occurring), such as a synthetic molecule (e.g., nucleic acid, protein, non-peptide, or organic 
molecule), or an extract made from biological materials such as bacteria, plants, fungi, or 

10 animal (particularly mammalian) cells or tissues. Modulators can be evaluated for potential 
activity as inhibitors or activators (directly or indirectly) of a biological process or processes 
(e.g., agonist, partial antagonist, partial agonist, inverse agonist, antagonist, antineoplastic 
agents, cytotoxic agents, inhibitors of neoplastic transformation or cell proliferation, cell 
proliferation-promoting agents, and the like) by inclusion in screening assays described 

1 5 herein. The activity of a modulator may be known, unknown or partially known. 

The term "test chemical" refers to a chemical to be tested by one or more screening 
method(s)of the invention as a putative modulator. A test chemical is usually not known to 
bind to the target of interest. The term "control test chemical" refers to a chemical known 
to bind to the target (e.g., a known agonist, antagonist, partial agonist or inverse agonist). 

2 0 Usually, various predetermined concentrations of test chemicals are used for screening, such 
as .01 uM, .1 uM, 1.0 uM, and 10.0 uM. • 

The term "target" refers to a biochemical entity involved a biological process. 
Targets are typically proteins that play a useful role in the physiology or biology of an 
organism. A therapeutic chemical binds to target to alter or modulate its function. As used 

2 5 herein targets can include cell surface receptors, G-proteins, kinases, ion channels, 

phopholipases and other proteins mentioned herein. 

The term "label" refers to a composition detectable by spectroscopic, 
photochemical, biochemical, immunochemical, or chemical means. For example, useful 
labels include 32 P, fluorescent dyes, fluorescent proteins, electron-dense reagents, enzymes 

3 0 (e.g., as commonly used in an ELISA), biotin, dioxigenin, or haptens and proteins for which 

antisera or monoclonal antibodies are available. For example, polypeptides of this invention 
can be made as detectible labels, by e.g., incorporating a them as into a polypeptide, and 
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used to label antibodies specifically reactive with the polypeptide. A label often generates a 
measurable signal, such as radioactivity, fluorescent light or enzyme activity, which can be 
used to quantitate the amount of bound label. 

The term "nucleic acid probe" refers to a nucleic acid molecule that binds to 
a specific sequence or sub-sequence of another nucleic acid molecule. A probe is preferably 
a nucleic acid molecule that binds through complementary base pairing to the full sequence 
or to a sub-sequence of a target nucleic acid. It will be understood that probes may bind 
target sequences lacking complete complementarity with the probe sequence depending 
upon the stringency of the hybridization conditions. Probes are preferably directly labelled 
as with isotopes, chromophores, lumiphores, chromogens, fluorescent proteins, or indirectly 
labelled such as with biotin to which a streptavidin complex may later bind. By assaying 
for the presence or absence of the probe, one can detect the presence or absence of the select 
sequence or sub-sequence. 

A "labeled nucleic acid probe" is a nucleic acid probe that is bound, either 
covalently, through a linker, or through ionic, van der Waals or hydrogen bonds to a label 
such that the presence of the probe may be detected by detecting the presence of the label 
bound to the probe. 

The terms "polypeptide" and "protein" refers to a polymer of amino acid 
residues. The terms apply to amino acid polymers in which one or more amino acid residue 
is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well 
as to naturally occurring amino acici polymers. The term "recombinant protein'' refers to a 
protein that is produced by expression of a nucleotide sequence encoding the amino acid 
sequence of the protein from a recombinant DNA molecule. 

The term "recombinant host cell" refers to a cell that comprises a 
recombinant nucleic acid molecule. Thus, for example, recombinant host cells can express 
genes that are not found within the native (non-recombinant) form of the cell. 

The terms "isolated" "purified" or "biologically pure" refer to material which 
is substantially or essentially free from components which normally accompany it as found 
in its native state. Purity and homogeneity are typically determined using analytical 
chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid 
chromatography. A protein or nucleic acid molecule which is the predominant protein or 
nucleic acid species present in a preparation is substantially purified. Generally, an isolated 
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protein or nucleic acid molecule will comprise more than' 80% of all macromolecular 
species present in the preparation. Preferably, the protein is purified to represent greater 
than 90% of all macromolecular species present. More preferably the protein is purified to 
greater than 95%, and most preferably the protein is purified to essential homogeneity, 
5 wherein other macromolecular species are not detected by conventional techniques. 

The term "naturally-occurring" as applied to an object refers to the fact that 
an object can be found in nature. For example, a polypeptide or polynucleotide sequence 
that is present in an organism (including viruses) that can be isolated from a source in nature 
and which has not been intentionally modified by man in the laboratory is naturally- 
1 0 occurring. 

The term "antibody" refers to a polypeptide substantially encoded by an 
immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically 
bind and recognize an analyte (antigen). The recognized immunoglobulin genes include the 
kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as the 
1 5 myriad immunoglobulin variable region genes. Antibodies exist, e.g., as intact 

immunoglobulins or as a number of well characterized fragments produced by digestion 
with various peptidases. This includes, e.g., Fab' and F(ab) 1 2 fragments. The term 
"antibody," as used herein, also includes antibody fragments either produced by the 
modification of whole antibodies or those synthesized de novo using recombinant DNA 
2 0 methodologies. 

The term "immunoassay" refers to an assay that utilizes an antibody to • 
specifically bind an analyte. The immunoassay is characterized by the use of specific 
binding properties of a particular antibody to isolate, target, and/or quantify the analyte. 

The term "identical" in the context of two nucleic acid or polypeptide 

2 5 sequences refers to the residues in the two sequences which are the same when aligned for 

maximum correspondence. When percentage of sequence identity is used in reference to 
proteins or peptides it is recognized that residue positions which are not identical often 
differ by conservative amino acid substitutions, where amino acids residues are substituted 
for other amino acid residues with similar chemical properties (e.g. charge or 

3 0 hydrophobicity) and therefore do not change the functional properties of the molecule. 

Where sequences differ in conservative substitutions, the percent sequence identity may be 
adjusted upwards to correct for the conservative nature of the substitution. Means for 
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making this adjustment are well known to those of skill in the art. Typically this involves 
scoring a conservative substitution as a partial rather than a full mismatch, thereby 
increasing the percentage sequence identity. Thus, for example, where an identical amino 
acid is given a score of 1 and a non-conservative substitution is given a score of zero, a 
5 conservative substitution is given a score between zero and 1 . The scoring of conservative 
substitutions is calculated, e.g., according to known algorithm. See. e.g., Meyers and 
Miller, Computer Applic. Biol. Sci., 4: 11-17 (1988); Smith and Waterman (198 1) Adv. 
Appl. Math. 2: 482; Needleman and Wunsch (1970) J. Mol. Biol. 48: 443; Pearson and 
Lipman (1988) Proc. Natl. Acad. Sci. USA 85: 2444; Higgins and Sharp (1988) Gene, 73: 

10 237-244 and Higgins and Sharp (1989) CABIOS 5: 15 1-153; Corpet, et ah (1988) Nucleic 
Acids Research 16, 10881-90; Huang, etal. (1992) Computer Applications in the 
Biosciences 8, 155-65, and Pearson, et al. (1994) Methods in Molecular Biology 24, 307-31. 
Alignment is also often performed by inspection and manual alignment. 

"Conservatively modified variations" of a particular nucleic acid sequence 

15 refers to those nucleic acids which encode identical or essentially identical amino acid 

sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially 
identical sequences. Because of the degeneracy of the genetic code, a large number of 
functionally identical nucleic acids encode any given polypeptide. For instance, the codons 
CGU, CGC, CGA, CGG, AGA, and AGG all encode the amino acid arginine. Thus, at 

2 0 every position where an arginine is specified by a codon, the codon can be altered to any of 
the corresponding codons described without altering the encoded polypeptide. Such nucleic 
acid variations are "silent variations," which are one species of "conservatively modified 
variations." Every nucleic acid sequence herein which encodes a polypeptide also describes 
every possible silent variation. One of skill will recognize that each codon in a nucleic acid 

2 5 (except AUG, which is ordinarily the only codon for methionine) can be modified to yield a 

functionally identical molecule by standard techniques. Accordingly, each "silent variation" 
of a nucleic acid which encodes a polypeptide is implicit in each described sequence. 
Furthermore, one of skill will recognize that individual substitutions, deletions or additions 
which alter, add or delete a single amino acid or a small percentage of amino acids 

3 0 (typically less than 5%, more typically less than 1 %) in an encoded sequence are 

"conservatively modified variations" where the alterations result in the substitution of an 
amino acid with a chemically similar amino acid. Conservative amino acid substitutions 
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providing functionally similar amino acids are well known in the art. The following six 
groups each contain amino acids that are conservative substitutions for one another: 

1) Alanine (A), Serine (S), Threonine (T); 

2) Aspartic acid (D), Glutamic acid (E); 

3) Asparagine (N), Glutamine (Q); 

4) Arginine (R), Lysine (K); 

5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 

6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

The term "complementary" means that one nucleic acid molecule has the 
sequence of the binding partner of another nucleic acid molecule. Thus, the sequence 5'- 
ATGC-3' is complementary to the sequence 5-GCAT-3'. 

An amino acid sequence or a nucleotide sequence is "substantially identical" 
or "substantially similar" to a reference sequence if the amino acid sequence or nucleotide 
sequence has at least 80% sequence identity with the reference sequence over a given 
comparison window. Thus, substantially similar sequences include those having, for 
example, at least 85% sequence identity, at least 90% sequence identity, at least 95% 
sequence identity or at least 99% sequence identity. Two sequences that are identical to 
each other are, of course, also substantially identical. 

A subject nucleotide sequence is "substantially complementary" to a 
reference nucleotide sequence if the complement of the subject nucleotide sequence is 
substantially identical to the reference nucleotide sequence. , 

The term "stringent conditions" refers to a temperature and ionic conditions 
used in nucleic acid hybridization. Stringent conditions are sequence dependent and are 
different under different environmental parameters. Generally, stringent conditions are 
selected to be about 5DC to 200 C lower than the thermal melting point (TJ for the specific 
sequence at a defined ionic strength and pH. The T m is the temperature (under defined ionic 
strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched 
probe. 

The term "allelic variants" refers to polymorphic forms of a gene at a 
particular genetic locus, as well as cDNAs derived from mRNA transcripts of the genes and 
the polypeptides encoded by them. 

The term "preferred mammalian codon" refers to the subset of codons from 
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among the set of codons encoding an amino acid that are most frequently used in proteins 
expressed in mammalian cells as chosen from the following list: 

Amino Acid Preferred codons for high level mammalian expression 

5 





Gly 


GGC.GGG 




Glu 


GAG 




Asp 


GAC 




Val 


GUG.GUC 


10 


Ala 


GCC,GCU 




Ser 


AGC/UCC 




Lys 


AAG 




Asn 


AAC 




Met 


AUG 


15 


He 


AUC 




Thr 


ACC 




Trp 


UGG 




Cys 


UGC 




Tyr 


UAU,UAC 


20 


Leu 


CUG 




Phe 


uuc 




Arg 


CGC,AGG,AGA 




Gin 


CAG 




His 


CAC 


25 


Pro 


CCC 



Fluorescent molecules are useful in fluorescence resonance energy transfer 
("FRET"). FRET involves a donor molecule and an acceptor molecule. To optimize the 
efficiency and detectability of FRET between a donor and acceptor molecule, several factors 

3 0 need to be balanced. The emission spectrum of the donor should overlap as much as 
possible with the excitation spectrum of the acceptor to maximize the overlap integral. 
Also, the quantum yield of the donor moiety and the extinction coefficient of the acceptor 
should likewise be as high as possible to maximize Rq, the distance at which energy transfer 
efficiency is 50%. However, the excitation spectra of the donor and acceptor should overlap 

35 as little as possible so that a wavelength region can be found at which the donor can be 

excited efficiently without directly exciting the acceptor. Fluorescence arising from direct 
excitation of the acceptor is difficult to distinguish from fluorescence arising from FRET. 
Similarly, the emission spectra of the donor and acceptor should overlap as little as possible 
so that the two emissions can be clearly distinguished. High fluorescence quantum yield of 



WO 98/06737 PCT/US97/14593 

17 

the acceptor moiety is desirable if the emission from the acceptor is to be measured either as 
the sole readout or as part of an emission ratio. One factor to be considered in choosing the 
donor and acceptor pair is the efficiency of fluorescence resonance energy transfer between 
them. Preferably, the efficiency of FRET between the donor and acceptor is at least 10%, 
5 more preferably at least 50% and even more preferably at least 80%. 

The term "fluorescent property" refers to the molar extinction coefficient at 
an appropriate excitation wavelength, the fluorescence quantum efficiency, the shape of the 
excitation spectrum or emission spectrum, the excitation wavelength maximum and 
emission wavelength maximum, the ratio of excitation amplitudes at two different 

1 0 wavelengths, the ratio of emission amplitudes at two different wavelengths, the excited state 
lifetime, or the fluorescence anisotropy. A measurable difference in any one of these 
properties between wild-type Aequorea GFP and the mutant form is useful. A measurable 
difference can be determined by determining the amount of any quantitative fluorescent 
property, e.g., the amount of fluorescence at a particular wavelength, or the integral of 

1 5 fluorescence over the emission spectrum. Determining ratios of excitation amplitude or 
emission amplitude at two different wavelengths ("excitation amplitude ratioing" and 
"emission amplitude ratioing", respectively) are particularly advantageous because the 
ratioing process provides an internal reference and cancels out variations in the absolute 
brightness of the excitation source, the sensitivity of the detector, and light scattering or 

2 0 quenching by the sample. 

II. LONG WAVELENGTH ENGINEERED FLUORESCENT PROTEINS 



A. Fluorescent Proteins 

As used herein, the term "fluorescent protein" refers to any protein capable of 
fluorescence when excited with appropriate electromagnetic radiation. This includes 
fluorescent proteins whose amino acid sequences are either naturally occurring or 
5 engineered (i.e., analogs or mutants). Many cnidarians use green fluorescent proteins 

("GFPs") as energy-transfer acceptors in bioluminescence. A "green fluorescent protein," as 
used herein, is a protein that fluoresces green light. Similarly, "blue fluorescent proteins" 
fluoresce blue light and "red fluorescent proteins" fluoresce red light. GFPs have been 
isolated from the Pacific Northwest jellyfish, Aequorea victoria, the sea pansy, Renilla 

10 reniformis, and Phialidium gregarium. W.W. Ward et al., Photochem. Photobiol., 35:803- 
808 (1982); L.D. Levine et al., Comp. Biochem. Physiol., 72B:77-85 (1982). 

A variety of Aequorea-relited fluorescent proteins having useful excitation 
and emission spectra have been engineered by modifying the amino acid sequence of a 
naturally occurring GFP from Aequorea victoria. (D.C. Prasher et al, Gene, 111 :229-233 

1 5 (1992); R. Heim et al, Proc. Natl. Acad. Sci., USA, 91 -.12501-04 ( 1 994); U.S. patent 
application 08/337,915, filed November 10, 1994; International application 
PCT/US95/14692. filed 1 1/10/95.) 

As used herein, a fluorescent protein is an "Aequorea-rzlatsd fluorescent 
protein" if any contiguous sequence of 150 amino acids of the fluorescent protein has at 

2 0 least 85% sequence identity with an amino acid sequence, either contiguous or non- 
contiguous, from the 238 amino-acid wild-type Aequorea green fluorescent protein of Fig. 3 * 
(SEQ ID NO:2). More preferably, a fluorescent protein is an Aequorea-related fluorescent 
protein if any contiguous sequence of 200 amino acids of the fluorescent protein has at least 
95% sequence identity with an amino acid sequence, either contiguous or non-contiguous, 

2 5 from the wild type Aequorea green fluorescent protein of Fig. 3 (SEQ ID NO:2). Similarly, 
the fluorescent protein may be related to Renilla or Phialidium wild-type fluorescent 
proteins using the same standards. 

Aequorea-relzted fluorescent proteins include, for example and without 
limitation, wild-type (native) Aequorea victoria GFP (D.C. Prasher et al., "Primary structure 

30 of the Aequorea victoria green fluorescent protein," Gene, ( 1 992) 1 1 1 :229-33 ), whose 

nucleotide sequence (SEQ ED NO:l) and deduced amino acid sequence (SEQ ID NO:2) are 
presented in Fig. 3; allelic variants of this sequence, e.g., Q80R, which has the glutamine 
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residue at position 80 substituted with arginine (M. Chalfie et al., Science, (1994) 263:802- 
805); those engineered Aequorea-reXaXsd fluorescent proteins described herein, e.g., in Table 
A or Table F, variants that include one or more folding mutations and fragments of these 
proteins that are fluorescent, such as Aequorea green fluorescent protein from which the two 
5 ammo-terminal amino acids have been removed. Several of these contain different aromatic 
amino acids within the central chromophore and fluoresce at a distinctly shorter wavelength 
than wild type species. For example, engineered proteins P4 and P4-3 contain (in addition 
to other mutations) the substitution Y66H, whereas W2 and W7 contain (in addition to other 
mutations) Y66W. Other mutations both close to the chromophore region of the protein and 
1 0 remote from it in primary sequence may affect the spectral properties of GFP and are listed 
in the first part of the table below. 

TABLE A 



Excitation Emission Extinct. Coeff. Quantum 



Clone 


Mutation/s) 


max (nm) 


max (nm) 


(M'W) 


yield 


Wild 
type 


None 


395 (475) 


508 


21,000 (7,150) 


0.77 


P4 


Y66H 


383 


447 


13,500 


0.21 


P4-3 


Y66H 
Y145F 


381 


445 


14,000 


0.38 


W7 


Y66W 
N146I 
M153T 
VI 63 A 
N212K 


433 (453) 


475 (501) 


18,000(17,100) 


0.67 


W2 


Y66W 
I123V 
Y145H 
H148R 
M153T 
V163A 
N212K 


432 (453) 


480 


10,000(9,600) 


0.72 


S65T 


S65T 


489 


511 


39,200 


0.68 


P4-1 


S65T 
M153A 


504 (396) 


514 


14,500 (8,600) 


0.53 
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K238E 






S65A 


S65A 


471 


504 


S65C 


S65C 


479 


507 


S65L 


S65L 


484 


510 


Y66F 


Y66F 


360 


442 


Y66W 


Y66W 


458 


480 



Additional mutations in Aequorea-related fluorescent proteins, referred to as 
"folding mutations," improve the ability of fluorescent proteins to fold at higher 
temperatures, and to be more fluorescent when expressed in mammalian cells, but have little 
5 or no effect on the peak wavelengths of excitation and emission. It should be noted that 
these may be combined with mutations that influence the spectral properties of GFP to 
produce proteins with altered spectral and folding properties. Folding mutations include: 
F64L, V68L, S72A, and also T44A, F99S, Y145F, N146I, M153T or A, V163A, I167T, 
S175G, S205T and N212K. 

10 As used herein, the term "loop domain" refers to an amino acid sequence of 

an Aeqitorea-rslsttd fluorescent protein that connects the amino acids involved in the 
secondary structure of the eleven strands of the □-barrel or the central □ -helix (residues 56- 
72) (see Fig. 1A and IB). 

As used herein, the "fluorescent protein moiety" of a fluorescent protein is 

1 5 that portion of the amino acid sequence of a fluorescent protein which, when the amino acid 
sequence of the fluorescent protein substrate is optimally aligned with the amino acid 
sequence of a naturally occurring fluorescent protein, lies between the amino terminal and 
carboxy terminal amino acids, inclusive, of the amino acid sequence of the naturally 
occurring fluorescent protein. 

20 It has been found that fluorescent proteins can be genetically fused to other 

target proteins and used as markers to identify the location and amount of the target protein 
produced. Accordingly, this invention provides fusion proteins comprising a fluorescent 
protein moiety and additional amino acid sequences. Such sequences can be, for example, 
up to about 1 5, up to about 50, up to about 150 or up to about 1000 amino acids long. The 
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fusion proteins possess the ability to fluoresce when excited by electromagnetic radiation. 
In one embodiment, the fusion protein comprises a poiyhistidine tag to aid in purification of 
the protein. 

5 B. Use Of The Crystal Structure Of Green Fluorescent Protein To Design 

Mutants Having Altered Fluorescent Characteristics 
Using X-ray crystallography and computer processing, we have created a 
model of the crystal structure of Aequorea green fluorescent protein showing the relative 
location of the atoms in the molecule. This information is useful in identifying amino acids 
1 0 whose substitution alters fluorescent properties of the protein. 

Fluorescent characteristics of /ie^worea-related fluorescent proteins depend, 
in part, on the electronic environment of the chromophore. In general, amino acids that are 
within about 0.5 nm of the chromophore influence the electronic environment of the 
chromophore. Therefore, substitution of such amino acids can produce fluorescent proteins 
1 5 with altered fluorescent characteristics. In the excited state, electron density tends to shift 
from the phenolate towards the carbonyl end of the chromophore. Therefore, placement of 
increasing positive charge near the carbonyl end of the chromophore tends to decrease the 
energy of the excited state and cause a red-shift in the absorbance and emission wavelength 
maximum of the protein. Decreasing positive charge near the carbonyl end of the 

2 0 chromcphcre tends to have the opposte effect, causing a blue-shift in the protein's 

wavelengths. 

Amino acids with charged (ionized D, E, K, and R), dipolar (H, N, Q, S, T, 
and uncharged D, E and K), and polarizable side groups (e.g., C, F, H, M, W and Y) are 
useful for altering the electronic environment of the chromophore, especially when they 
25 substitute an amino acid with an uncharged, nonpolar or non-polarizable side chain. In 

general, amino acids with polarizable side groups alter the electronic environment least, and; 
consequently, are expected to cause a comparatively smaller change in a fluorescent 
property. Amino acids with charged side groups alter the environment most, and, 
consequently, are -expected to cause a comparatively larger change in a fluorescent property. 

3 0 However, amino acids with charged side groups are more likely to disrupt the structure of 

the protein and to prevent proper folding if buried next to the chromophore without any 
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additional solvation or salt bridging. Therefore charged amino acids are most likely to be 
tolerated and to give useful effects when they replace other charged or highly polar amino 
acids that are already solvated or involved in salt bridges. In certain cases, where 
substitution with a polarizable amino acid is chosen, the structure of the protein may make 
5 selection of a larger amino acid, e.g., W, less appropriate. Alternatively, positions occupied 
by amino acids with charged or polar side groups that are unfavorably oriented may be 
substituted with amino acids that have less charged or polar side groups. In another 
alternative, an amino acid whose side group has a dipole oriented in one direction in the 
protein can be substituted with an amino acid having a dipole oriented in a different 
1 0 direction. 



nm from the chromophore whose substitution can result in altered fluorescent 
characteristics. The table indicates, underlined, preferred amino acid substitutions at the 
indicated location to alter a fluorescent characteristic of the protein. In order to introduce 

15 such substitutions, the table also provides codons for primers used in site- directed 
mutagenesis involving amplification. These primers have been selected to encode 
economically the preferred amino acids, but they encode other amino acids as well, as 
indicated, or even a stop codon, denoted by Z. In introducing substixutions using such 
degenerate primers the most efficient strategy is to screen the collection to identify mutants 

20 with the desired properties and then sequence their DNA to find out which of the possible 

substitutions is responsible. Codons are shown in double-stranded form with sense strand * 
above, antisense strand below. In nucleic acid sequences, R=(A or g); Y=(C or T); M={A or 
C); K=(g or T); S-(g or C); W=(A or T); H=(A, T, or C); B=(g, T, or C); V=(g, A, or C); 
D=(g, A, or T); N=(A, C, g, or T). 



More particularly, Table B lists several amino acids located within about 0.5 



25 



TABLE B 



Original position and presumed role 



Change to 



Codon 



L42 Aliphatic residue near C=N of chromophore 



CFHLQRWYZ 5'YDS 3' 
3'RHS 5" 



30 



V61 



Aliphatic residue 



central -CH= of chromophore FYHC LR 



YDC 



RHg 
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T62 Almost directly above center of chromophore bridge AVFS KYC 

MRg 

5 DEHKNQ VAS 

BTS 

FYHC LR YDC 
RHg 

1 0 V68 Aliphatic residue near carbonyl and G67 FYH L YWC 

RWg 

N121 Near C-N site of ring closure between T65 and G67 CFHLQRWYZ YDS 

RHS 

15 

Y145 Packs near tyrosine ring of chromophore WCFL TKS 

AMS 

DEHNKQ VAS 

2 0 BTS 

HI 48 H-bonds to phenolate oxygen FYNI WWC 

WWg 

25 KQR MRg 

KYC 

VI SO Aliphatic residue near tyrosine ring of chromophore FYH L YWC 

RWg 

30 

F165 Packs near tyrosine ring CKQRWYZ YRS 

RYS 

1167 Aliphatic residue near phenolate; I167T has effects FYHL YWC 

35 RWg 

T203 H-bonds to phenolic oxygen of chromophore FHLQRWYZ YDS 

RHS 

4 0 E222 Protonarion regulates ionization of chromophore HKNQ MAS 

KTS 

Examples of amino acids with polar side groups that can be substituted with 
polarizable side groups include, for example, those in Table C. 
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Original position and presumed role 
5 Q69 Terminates chain of H-bonding waters 

Q94 H-bonds to carbonyl terminus of chromophore 

10 

Ql 83 Bridges Arg96 and center of chromophore bridge 



Change tc 

KREG 



RRg 
YYC 



PEHKN Q VAS 
BTS 



YAC 
RTG 



EK RAg 
YTC 

N185 Part of H-bond network near carbonyl of chromophore DEHNKO VAS 

BTS 

In another embodiment, an amino acid that is close to a second amino acid 
within about 0.5 nm of the chromophore can, upon substitution, alter the electronic 
properties of the second amino acid, in turn altering the electronic environment of the 
chromphore. Table D presents two such amino acids. The amino acids, L220 and V224, 
are close to E222 and oriented in the same direction in the □ pleated sheet. 



Original position and presumed role 
L220 Packs next to Glu222; to make GFP pH si 

V224 Packs next to Glu222; to make GFP pH sensitive 



Change to 
HKNPOT 

HKNPOT 



MMS 
KKS 



MMS 
KKS 



CFHLQRWYZ YDS 
RHS 
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One embodiment of the invention includes a nucleic acid molecule comprising a 
nucleotide sequence encoding a functional engineered fluorescent protein whose amino acid 
sequence is substantially identical to the amino acid sequence of Aequorea green fluorescent 
protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least a substitution at 
5 Q69, wherein the functional engineered fluorescent protein has a different fluorescent 
property than Aequorea green fluorescent protein. Preferably, the substitution at Q69 is 
selected from the group of K, R, E and G. The Q69 substitution can be combined with other 
mutations to improve the properties of the protein, such as a functional mutation at S65. 
One embodiment of the invention includes a nucleic acid molecule comprising a 

10 nucleotide sequence encoding a functional engineered fluorescent protein whose amino acid 
sequence is substantially identical to the amino acid sequence of Aequorea green fluorescent 
protein (SEQ ID N0.2) and which differs from SEQ ID NO:2 by at least a substitution at 
E222, but not including E222G, wherein the functional engineered fluorescent protein has a 
different fluorescent property than Aequorea green fluorescent protein. Preferably, the 

1 5 substitution at E222 is selected from the group of N and Q. The E222 substitution can be 
combined with other mutations to improve the properties of the protein, such as a functional 
mutation at F64. 

One embodiment of the invention includes a nucleic acid molecule comprising a 
nucleotide sequence encoding a functional engineered fluorescent protein whose amino acid 
2 0 sequence is substantially identical to the amino acid sequence of Aequorea green fluorescent 
protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least a substitution at 
Y145, wherein the functional engineered fluorescent protein has a different fluorescent 
property than Aequorea green fluorescent protein. 

Preferably, the substitution at Y145 is selected from the group of W, C, F, L, E, H, K and Q. 
2 5 The Yl 45 substitution can be combined with other mutations to improve the properties of 
the protein, such as a Y66. 
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The invention also includes computer related embodiments, including computational 
methods of using the crystal coordinates for designing new fluorescent protein mutations 
and devices for storing the crystal data, including coordinates. For instance the 
invention includes a device comprising a storage device and, stored in the device, at least 10 
5 atomic coordinates selected from the atomic coordinates listed in Figs. 5-1 to 5-28. More 
coordinates can be storage depending of the complexity of the calculations or the objective 
of using the coordinates (e.g. about 100, 1,000, or more coordinates). For example, larger 
numbers of coordinates will be desirable for more detailed representations of fluorescent 
protein structure. Typically, the storage device is a computer readable device that stores 
1 0 code that it receives as input the atomic coordinates. Although, other storage meand as 

known in the art are contemplated. The computer readable device can be a floppy disk or a 
hard drive. 
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C. Production Of Long Wavelength Engineered Fluorescent Proteins 

Recombinant production of a fluorescent protein involves expressing a 
nucleic acid molecule having sequences that encode the protein. 

In one embodiment, the nucleic acid encodes a fusion protein in which a 
5 single polypeptide includes the fluorescent protein moiety within a longer polypeptide. The 
longer polypeptide can include a second functional protein, such as FRET partner or a 
protein having a second function (e.g., an enzyme, antibody or other binding protein). 
Nucleic acids that encode fluorescent proteins are useful as starting materials. 

The fluorescent proteins can be produced as fusion proteins by recombinant 
10 DNA technology. Recombinant production of fluorescent proteins involves expressing 

nucleic acids having sequences that encode the proteins. Nucleic acids encoding fluorescent 
proteins can be obtained by methods known in the art. Fluorescent proteins can be made by 
site-specific mutagenesis of other nucleic acids encoding fluorescent proteins, or by random 
mutagenesis caused by increasing the error rate of PCR of the original polynucleotide with 
15 0.1 mM MnCU and unbalanced nucleotide concentrations. See, e.g., U.S. patent application 
08/337,915, filed November 10, 1994 or International application PCT/US95/14692, filed 
1 1/10/95. The nucleic acid encoding a green fluorescent protein can be isolated by 
polymerase chain reaction of cDNA from A. victoria using primers based on the DNA 
sequence of A. victoria green fluorescent protein, as presented in Fig. 3. PCR methods are 
2 0 describ ed in, for example, U.S . Pat. No. 4,683 , 1 95 ; Muilis et al. (1 9S7) Cold Spring Harbor 
Symp. Quant. Biol. 5 1 :263; and Erlich, ed., PCR Technology, (Stockton Press, NY, 1989). 

The construction of expression vectors and the expression of genes in 
transfected cells involves the use of molecular cloning techniques also well known in the 
art. Sambrook et al., Molecular Cloning - A Laboratory Manual, Cold Spring Harbor 

2 5 Laboratory, Cold Spring Harbor, NY, ( 1 989) and Current Protocols in Molecular Biology, 

F.M. Ausubel et al., eds., (Current Protocols, a joint venture between Greene Publishing 
Associates, Inc. and John Wiley & Sons, Inc.). The expression vector can be adapted for 
function in prokaryotes or eukaryotes by inclusion of appropriate promoters, replication 
sequences, markers, etc. 

3 0 Nucleic acids used to transfect cells with sequences coding for expression of the 

polypeptide of interest generally will be in the form of an expression vector including 
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expression control sequences operatively linked to a nucleotide sequence coding for 
expression of the polypeptide. As used, the term "nucleotide sequence coding for 
expression of a polypeptide refers to a sequence that, upon transcription and translation of 
mRNA, produces the polypeptide. This can include sequences containing, e.g., introns. 
5 Expression control sequences are operatively linked to a nucleic acid sequence when the 
expression control sequences control and regulate the transcription and, as appropriate, 
translation of the nucleic acid sequence. Thus, expression control sequences can include 
appropriate promoters, enhancers, transcription terminators, a start codon (i.e., ATG) in 
front of a protein-encoding gene, splicing signals for introns, maintenance of the correct 

1 0 reading frame of that gene to permit proper translation of the nxRNA, and stop codons. 

Methods which are well known to those skilled in the art can be used to 
construct expression vectors containing the fluorescent protein coding sequence and 
appropriate transcriptional/translational control signals. These methods include in vitro 
recombinant DNA techniques, synthetic techniques and in vivo recombination/genetic 

1 5 recombination. (See, for example, the techniques described in Maniatis, et ai, Molecular 
Cloning A Laboratory Manual, Cold Spring Harbor Laboratory, N.Y., 1989). 

Transformation of a host cell with recombinant DNA may be carried out by 
conventional techniques as are well known to those skilled in the art. Where the host is 
prokar 'otic, such as E. coli, competent cells which are capable of DNA uptake can be 

2 0 prepared from cells harvested after exponential growth phase and subsequently treated by 

the CaCl, method by procedures well known in the art. Alternatively, MgCL or RbCl can 
be used. Transformation can also be performed after forming a protoplast of the host cell or 
by electroporation. 

When the host is a eukaryote, such methods of transfection of DNA as calcium 
25 phosphate co-precipitates, conventional mechanical procedures such as microinjection, 

electroporation, insertion of a plasmid encased in liposomes, or virus vectors may be used. 
Eukaryotic cells can also be cotransfected with DNA sequences encoding the fusion 
polypeptide of the invention, and a second foreign DNA molecule encoding a selectable 
phenotype, such as the herpes simplex thymidine kinase gene. Another method is to use a 

3 0 eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to 

transiently infect or transform eukaryotic cells and express the protein. (Eukaryotic Viral 
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Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1982). Preferably, a eukaryotic host 
is utilized as the host cell as described herein. 

Techniques for the isolation and purification of either microbially or eukaryotically 
expressed polypeptides of the invention may be by any conventional means such as, for 
5 example, preparative chromatographic separations and immunological separations such as 
those involving the use of monoclonal or polyclonal antibodies or antigen. 

In one embodiment recombinant fluorescent proteins can be produced by expression 
of nucleic acid encoding for the protein in E. coli. Aequorea-reiated fluorescent proteins are 
best expressed by cells cultured between about 150 C and 30 □ C but higher temperatures 

10 (e.g. 3 7D C) are possible. After synthesis, these enzymes are stable at higher temperatures 
(e.g., 370 C) and can be used in assays at those temperatures. 

A variety of host-expression vector systems may be utilized to express 
fluorescent protein coding sequence. These include but are not limited to microorganisms 
such as bacteria transformed with recombinant bacteriophage DNA, plasmid DNA or 

1 5 cosmid DNA expression vectors containing a fluorescent protein coding sequence; yeast 
transformed with recombinant yeast expression vectors containing the fluorescent protein 
coding sequence; plant cell systems infected with recombinant virus expression vectors 
(e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with 
recombinant plasmid expression vectors (e.g. , Ti plasmid) containing a fluorescent protein 

2 0 coding sequence; insect cell systems infected with recombinant virus expression vectors 

(e.g., baculovirus) containing a fluorescent protein coding sequence; or animal cell systems* 
infected with recombinant virus expression vectors (e.g., retroviruses, adenovirus, vaccinia 
virus) containing a fluorescent protein coding sequence, or transformed animal cell systems 
engineered for stable expression. 

2 5 Depending on the host/vector system utilized, any of a number of suitable 

transcription and translation elements, including constitutive and inducible promoters, 
transcription enhancer elements, transcription terminators, etc. may be used in the 
expression vector (see, e.g., Bitter, et al, Methods in Enzymology 153:516-544, 1987). For 
example, when cloning in bacterial systems, inducible promoters such as pL of 

3 0 bacteriophage □ , plac, ptrp, ptac (ptrp-lac hybrid promoter) and the like may be used. 

When cloning in mammalian cell systems, promoters derived from the genome of 
mammalian cells (e.g., metallothionein promoter) or from mammalian viruses (e.g., the 
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retrovirus long terminal repeat; the adenovirus late promoter; the vaccinia virus 7.5K 
promoter) may be used. Promoters produced by recombinant DNA or synthetic techniques 
may also be used to provide for transcription of the inserted fluorescent protein coding 
sequence. 

5 In bacterial systems a number of expression vectors may be advantageously 

selected depending upon the use intended for the fluorescent protein expressed. For 
example, when large quantities of the fluorescent protein are to be produced, vectors which 
direct the expression of high levels of fusion protein products that are readily purified may 
be desirable. Those which are engineered to contain a cleavage site to aid in recovering 

10 fluorescent protein are preferred. 

In yeast, a number of vectors containing constitutive or inducible promoters may 
be used. For a review see, Current Protocols in Molecular Biology, Vol. 2, Ed. Ausubel, et 
al, Greene Publish. Assoc. & Wiley Interscience, Ch. 13, 1988; Grant, et al, Expression 
and Secretion Vectors for Yeast, in Methods in Enzymology, Eds. Wu & Grossman, 31987, 

15 Acad. Press, N.Y., Vol. 153, pp.5 16-544, 1987; Glover, DNA Cloning, Vol. H, TRL Press, 
Wash., D.C., Ch. 3, 1986; and Bitter, Heterologous Gene Expression in Yeast, Methods in 
Enzymology, Eds. Berger & Kimmel, Acad. Press, N.Y., Vol. 152, pp. 673-684, 1987; and 
The Molecular Biology of the Yeact Saccharomyces, Eds. Strafhem et al, Cold Spring 
Harbor Press, Vols. I and II, 1982. A constitutive yeast promoter such as ADH or LEU2 or 

20 an inducible promoter such as GAL may be used (Cloning in Yeast, Ch. 3 , R. Rothstein In: 
DNA Cloning Vol.1 1, A Practical Approach, Ed. DM Glover, IRL Press, Wash., D.C., 
1986). Alternatively, vectors may be used which promote integration of foreign DNA 
sequences into the yeast chromosome. 

In cases where plant expression vectors are used, the expression of a fluorescent 

2 5 protein coding sequence may be driven by any of a number of promoters. For example, 
viral promoters such as the 35S RNA and 19S RNA promoters of CaMV (Brisson, et al, 
Nature 3 1 0:5 1 1 -5 1 4, 1 984), or the coat protein promoter to TMV (Takamatsu, et al. , EMBO 
J. 6:307-31 1, 1987) may be used; alternatively, plant promoters such as the small subunit of 
RUBISCO (Coruzzi, et al., 1984, EMBO J. 3:1671-1680; Broglie, et al, Science 224:838- 

30 843, 1984); or heat shock promoters, e.g., soybean hspl7.5-E or hspl7.3-B (Gurley, et al, 
Mol. Cell. Biol. 6:559-565, 1986) may be used. These constructs can be introduced into 
plant cells using Ti plasmids, Ri plasmids, plant virus vectors, direct DNA transformation, 
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microinjection, electroporation, etc. For reviews of such 'techniques see, for example, 
Weissbach & Weissbach, Methods for Plant Molecular Biology, Academic Press, NY, 
Section VIII, pp. 421-463, 1988; and Grierson & Corey, Plant Molecular Biology, 2d Ed., 
Blackie, London, Ch. 7-9, 1988. 
5 An alternative expression system which could be used to express fluorescent 

protein is an insect system. In one such system, Autographa californica nuclear poly- 
hedrosis virus (AcNPV) is used as a vector to express, foreign genes. The virus grows in 
Spodoptera frugiperda cells. The fluorescent protein coding sequence may be cloned into 
non-essential regions (for example, the polyhedrin gene) of the virus and placed under 

1 0 control of an AcNPV promoter (for example the polyhedrin promoter). Successful insertion 
of the fluorescent protein coding sequence will result in inactivation of the polyhedrin gene 
and production of non-occluded recombinant virus {i.e., virus lacking the proteinaceous coat 
coded for by the polyhedrin gene). These recombinant viruses are then used to infect 
Spodoptera frugiperda cells in which the inserted gene is expressed, see Smith, et ai, J. 

1 5 Viol. 46:584, 1 983; Smith, U.S. Patent No. 4,2 1 5,05 1 . 

Eukaryotic systems, and preferably mammalian expression systems, allow for 
proper post -transiational modifications of expressed mammalian proteins to occur. 
Eukaryotic cells which possess the cellular machinery for proper processing of the primary 
transcript, glycosylation, phosphorylation, and, advantageously secretion of the gene 

2 0 product should be used as host cells for the expression of fluorescent protein. Such host 

cell lines may include but are not limited to CHO, VERO, BHK, HeLa, COS, MDCK, . 
Jurkat, HEK-293, and WD 8. 

Mammalian cell systems which utilize recombinant viruses or viral elements to 
direct expression may be engineered. For example, when using adenovirus expression 

2 5 vectors, the fluorescent protein coding sequence may be ligated to an adenovirus 

transcription/translation control complex, e.g., the late promoter and tripartite leader 
sequence. This chimeric gene may then be inserted in the adenovirus genome by in vitro or 
in vivo recombination. Insertion in a non-essential region of the viral genome (e.g., region 
El or E3) will result in a recombinant virus that is viable and capable of expressing the 

3 0 fluorescent protein in infected hosts (e.g., see Logan & Shenk, Proc. Natl. Acad. Set. USA, 
81: 3655-3659, 1984). Alternatively, the vaccinia virus 7.5K promoter may be used, (e.g., 
see, Mackett, et al. t Proc. Natl. Acad. Sci. USA ,79: 7415-7419, 1982; Mackett, et al, J. 



WO 98/06737 PCTAJS97/14593 

32 

Virol. 49: 857-864, 1984; Panicali, el al., Proc. Natl. Acad. Sci. USA 79: 4927-4931, 1982). 
Of particular interest are vectors based on bovine papilloma virus which have the ability to 
replicate as extrachromosomal elements (Sarver, et al, Mol. Cell. Biol. 1: 486, 1981). 
Shortly after entry of this DNA into mouse cells, the plasmid replicates to about 100 to 200 
5 copies per cell. Transcription of the inserted cDNA does not require integration of the 
plasmid into the host's chromosome, thereby yielding a high level of expression. These 
vectors can be used for stable expression by including a selectable marker in the plasmid, 
such as the neo gene. Alternatively, the retroviral genome can be modified for use as a 
vector capable of introducing and directing the expression of the fluorescent protein gene in 

1 0 host cells (Cone & Mulligan, Proc. Natl. Acad. Sci. USA, 8 1 :6349-6353, 1984). High level 
expression may also be achieved using inducible promoters, including, but not limited to, 
the metallothionine IIA promoter and heat shock promoters. 

The invention can also include a localization sequence, such as a nuclear 
localization sequence, an endoplasmic reticulum localization sequence, a peroxisome 

1 5 localization sequence, a mitochondrial localization sequence, or a localized protein. 

Localization sequences can be targeting sequences which are described, for example, in 
"Protein Targeting", chapter 35 of Stryer, L., Biochemistry (4th ed.). W.H. Freeman, 1995. 
The localization sequence can aiso be a localized protein. Some important localization 
sequences include those targeting the nucleus (KKKRK), mitochondrion (amino terminal 

2 0 MLRTSSLFTRRVQPSLFRMLRLQST-), endoplasmic reticulum (KDEL at C-terminus, 
assuming a signal sequence present at N-terminus), peroxisome (SKF at C-terminus), 
prenylation or insertion into plasma membrane (CaaX, CC, CXC, or CCXX at C-terminus), 
cytoplasmic side of plasma membrane (fusion to SNAP-25), or the Golgi apparatus (fusion 
to furin). 

2 5 For long-term, high-yield production of recombinant proteins, stable expression 

is preferred. Rather than using expression vectors which contain viral origins of replication, 
host cells can be transformed with the fluorescent protein cDNA controlled by appropriate 
expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, 
polyadenylation sites, etc.), and a selectable marker. The selectable marker in the 

3 0 recombinant plasmid confers resistance to the selection and allows cells to stably integrate 

the plasmid into their chromosomes and grow to form foci which in turn can be cloned and 
expanded into cell lines. For example, following the introduction of foreign DNA, 
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engineered cells may be allowed to grow for 1-2 days in an enriched media, and then are 
switched to a selective media. A number of selection systems may be used, including but 
not limited to the herpes simplex virus thymidine kinase (Wigler, et ai, Cell, 11: 223, 
1977), hypoxanthine-guanine phosphoribosyltransferase (Szybalska & Szybalski, Proc. 
5 Natl. Acad. Sci. USA, 48:2026, 1962), and adenine phosphoribosyltransferase (Lowy, et ai, 
Cell, 22: 817, 1980) genes can be employed in fk', hgprt or aprt" cells respectively. Also, 
antimetabolite resistance can be used as the basis of selection for dhfr, which confers 
resistance to methotrexate (Wigler, etai, Proc. Natl. Acad. Sci. USA, 77: 3567, 1980; 
O'Hare, et ai, Proc. Natl. Acad. Sci. USA, 8: 1527, 1981); gpt, which confers resistance to 

10 mycophenolic acid (Mulligan & Berg, Proc. Natl. Acad. Sci. USA, 78: 2072, 1981; neo, 
which confers resistance to the aminoglycoside G-418 (Colberre-Garapin, etai, J. Mol. 
Biol., 150:1, 1981); and hygro, which confers resistance to hygromycin (Santerre, et ai, 
Gene, 30: 147, 1984) genes. Recently, additional selectable genes have been described, 
namely trpB, which allows cells to utilize indole in place of tryptophan; hisD, which allows 

1 5 cells to utilize histinol in place of histidine (Hartman & Mulligan, Proc. Natl. Acad. Sci. 
USA, 85:8047, 1988); and ODC (ornithine decarboxylase) which confers resistance to the 
ornithine decarboxylase inhibitor, 2-(difluoromethyl)-DL-ornithine, DFMO (McConlogue 
L., In: Current Communications in Molecular Biology, Cold Spring Harbor Laboratory, ed., 
1987). 

2 0 DNA sequences encoding the fluorescence protein polypeptide of the invention 

can be expressed in vitro by DNA transfer into a suitable host celi. "Host cells" are cells in , 
which a vector can be propagated and its DNA expressed. The term also includes any 
progeny of the subject host cell. It is understood that all progeny may not be identical to the 
parental cell since there may be mutations that occur during replication. However, such 

2 5 progeny are included when the term "host cell" is used. Methods of stable transfer, in other 

words when the foreign DNA is continuously maintained in the host, are known in the art. 

The expression vector can be transfected into a host cell for expression of the 
recombinant nucleic acid. Host cells can be selected for high levels of expression in order 
to purify the fluorescent protein fusion protein. E. coli is useful for this purpose. 

3 0 Alternatively, the host cell can be a prokaryotic or eukaryotic cell selected to study the 

activity of an enzyme produced by the cell. In this case, the linker peptide is selected to 
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include an amino acid sequence recognized by the protease. The cell can be, e.g., a cultured 
cell or a cell in vivo. 

A primary advantage of fluorescent protein fusion proteins is that they are 
prepared by normal protein biosynthesis, thus completely avoiding organic synthesis and 
5 the requirement for customized unnatural amino acid analogs. The constructs can be 

expressed in E. coli in large scale for in vitro assays. Purification from bacteria is simplified 
when the sequences include polyhistidine tags for one^step purification by nickel-chelate 
chromatography. Alternatively, the substrates can be expressed directly in a desired host 
cell for assays in situ. 

10 In another embodiment, the invention provides a transgenic non-human animal 

that expresses a nucleic acid sequence which encodes the fluorescent protein. 

The "non-human animals" of the invention comprise any non-human animal 
having nucleic acid sequence which encodes a fluorescent protein. Such non-human animals 
include vertebrates such as rodents, non-human primates, sheep, dog, cow, pig, amphibians, 

1 5 and reptiles. Preferred non-human animals are selected from the rodent family including rat 
and mouse, most preferably mouse. The "transgenic non-human animals" of the invention 
rre produced by introducing "transgenes" into the germline of the non-human animal. 
Embryonal target cells at various developmental stages can be used to introduce transgenes. 
Different methods are used depending on the stage of development of the embryonal target 

2 0 cell. The zygote is the best target for micro-injection. In the mouse, the male pronucleus 
reaches the size of approximately 20 micrometers in diameter which allows reproducible 
injection of 1-2 pi of DNA solution. The use of zygotes as a target for gene transfer has a 
major advantage in that in most cases the injected DNA will be incorporated into the host 
gene before the first cleavage (Brinster et al, Proc. Natl. Acad. Sci. USA 82:4438-4442, 

25 1985). As a consequence, all cells of the transgenic non-human animal will carry the 

incorporated transgene. This will in general also be reflected in the efficient transmission of 
the transgene to offspring of the founder since 50% of the germ cells will harbor the 
transgene. Microinjection of zygotes is the preferred method for incorporating transgenes in 
practicing the invention. 

30 The term "transgenic" is used to describe an animal which includes exogenous 

genetic material within all of its cells. A "transgenic" animal can be produced by cross- 
breeding two chimeric animals which include exogenous genetic material within cells used 
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in reproduction. Twenty- five percent of the resulting offspring will be transgenic i.e., 
animals which include the exogenous genetic material within all of their cells in both 
alleles. 50% of the resulting animals will include the exogenous genetic material within one 
allele and 25% will include no exogenous genetic material. 
5 Retroviral infection can also be used to introduce transgene into a non-human 

animal. The developing non-human embryo can be cultured in vitro to the blastocyst stage. 
During this time, the blastomeres can be targets for retro viral infection (Jaenich, R., Proc. 
Natl. Acad. Sci USA 73:1260-1264, 1976). Efficient infection of the blastomeres is obtained 
by enzymatic treatment to remove the zona pellucida (Hogan, et al. (1986) in Manipulating 

1 0 the Mouse Embryo, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.), The 
viral vector system used to introduce the transgene is typically a replication-defective retro 
virus carrying the transgene (Jahner, et al., Proc. Natl. Acad. Sci. USA 82:6927-6931, 1985; 
Van der Putten, et al, Proc. Natl. Acad. Sci USA 82:6148-6152, 1985). Transfection is 
easily and efficiently obtained by culturing the blastomeres on a monolayer of 

15 virus-producing cells (Van der Putten, supra; Stewart, etal.,EMBOJ. 6:383-388, 1987). 

Alternatively, infection can be performed at a later stage. Virus or virus-producing cells can 
be injeczsd into the blastocoele (D. Jahn;r et a!., Nature 298:623-628, 19S2). Most of the 
founders will be mosaic for the transgene since incorporation occurs only in a subset of the 
cells which formed the transgenic nonhuman animal. Further, the founder may contain 

20 various retro viral insertions of the transgene at different positions in the genome which 

generally will segregate in the offspring. Ln addition, it is also possible to introduce • 
transgenes into the germ line, albeit with low efficiency, by intrauterine retro viral infection 
of the midgestation embryo (D. Jahner et al, supra). 

A third type of target cell for transgene introduction is the embryonal stem ceil 

2 5 (ES). ES cells are obtained from pre-implantation embryos cultured in vitro and fused with 
embryos (M. J. Evans et al. Nature 292:154-156, 1981; M.O. Bradley et al., Nature 309: 
255-258, 1984; Gossler, etal.,Proc. Natl. Acad. Sci USA 83: 9065-9069, 1986; and 
Robertson et al, Nature 322:445-448, 1986). Transgenes can be efficiently introduced into 
the ES cells by DNA transfection or by retro virus-mediated transduction. Such transformed 

30 ES cells can thereafter be combined with blastocysts from a nonhuman animal. The ES cells 
thereafter colonize the embryo and contribute to the germ line of the resulting chimeric 
animal. (For review see Jaenisch, R., Science 240: 1468-1474, 1988). 
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"Transformed" means a cell into which (or into an ancestor of which) has been 
introduced, by means of recombinant nucleic acid techniques, a heterologous nucleic acid 
molecule. "Heterologous" refers to a nucleic acid sequence that either originates from 
another species or is modified from either its original form or the form primarily expressed 
5 in the cell. 

"Transgene" means any piece of DNA which is inserted by artifice into a cell, 
and becomes part of the genome of the organism (i.e., either stably integrated or as a stable 
extrachromosomal element) which develops from that cell. Such a transgene may include a 
gene which is partly or entirely heterologous (i.e., foreign) to the transgenic organism, or 

1 0 may represent a gene homologous to an endogenous gene of the organism. Included within 
this definition is a transgene created by the providing of an RNA sequence which is 
transcribed into DNA and then incorporated into the genome. The trans genes of the 
invention include DNA sequences which encode which encodes the fluorescent protein 
which may be expressed in a transgenic non-human animal. The term "transgenic" as used 

1 5 herein additionally includes any organism whose genome has been altered by in vitro 

manipulation of the early embryo or fertilized egg or by any transgenic technology to induce 
a specific gene knockout. The term "gene knockout" as used herein, refers to the targeted 
disruption of a gene in vivo with complete loss of function that has been achieved by any 
transgenic technology familiar to those in the art. In one embodiment, transgenic animals 

2 0 having gene knockouts are those in which the target gene has been rendered nonfunctional 

by an insertion targeted to the gene to be rendered non-functional by homologous 
recombination. As used herein, the term "transgenic" includes any transgenic technology 
familiar to those in the art which can produce an organism carrying an introduced transgene 
or one in which an endogenous gene has been rendered non-functional or "knocked out." 

25 

III. USES OF ENGINEERED FLUORESCENT PROTEINS 

The proteins of this invention are useful in any methods that employ 

fluorescent proteins. 

The engineered fluorescent proteins of this invention are useful as 

3 0 fluorescent markers in the many ways fluorescent markers already are used. This includes, 

for example, coupling engineered fluorescent proteins to antibodies, nucleic acids or other 
receptors for use in detection assays, such as immunoassays or hybridization assays. 
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The engineered fluorescent proteins of this invention are useful to track the 
movement of proteins in cells. In this embodiment, a nucleic acid molecule encoding the 
fluorescent protein is fused to a nucleic acid molecule encoding the protein of interest in an 
expression vector. Upon expression inside the cell, the protein of interest can be localized 
5 based on fluorescence. In another version, two proteins of interest are fused with two 
engineered fluorescent proteins having different fluorescent characteristics. 

The engineered fluorescent proteins of this invention are useful in systems to 
detect induction of transcription. In certain embodiments, a nucleotide sequence encoding 
the engineered fluorescent protein is fused to expression control sequences of interest and 

1 0 the expression vector is transfected into a cell. Induction of the promoter can be measured 
by detecting the expression and/or quantity of fluorescence. Such constructs can be used 
used to follow signaling pathways from receptor to promoter. 

The engineered fluorescent proteins of this invention are useful in 
applications involving FRET. Such applications can detect events as a function of the 

1 5 movement of fluorescent donors and acceptor towards or away from each other. One or 

both of the donor/acceptor pair can be a fluorescent protein. A preferred donor and receptor 
pair for FRET based assays is a donor with a T203I mutation and an acceptor with the 
mutation T203X, wherein X is an aromatic amino acid-39, especially T203Y, T203W, or 
T203H, In a particularly useful pair the donor contains the following mutations: S72A, 

20 K79R, Y145F, M153A and T203I (with a excitation peak of 395 nm and an emission peak 
of 51 1 nm) and the acceptor contains the following mutations S65G, S72A, K79R, and * 
T203 Y. This particular pair provides a wide separation between the excitation and emission 
peaks of the donor and provides good overlap between the donor emission spectrum and the 
acceptor excitation spectrum. Other red-shifted mutants, such as those described herein, 

2 5 can also be used as the acceptor in such a pair. 

In one aspect, FRET is used to detect the cleavage of a substrate having the 
donor and acceptor coupled to the substrate on opposite sides of the cleavage site. Upon 
cleavage of the substrate, the donor/acceptor pair physically separate, eliminating FRET. 
Assays involve contacting the substrate with a sample, and determining a qualitative or 

3 0 quantitative change in FRET. In one embodiment, the engineered fluorescent protein is 

used in a substrate for □-lactamase. Examples of such substrates are described in United 
States patent applications 08/407,544, filed March 20, 1995 and International Application 
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PCT/US96/04059, filed March 20, 1996. In another embodiment, an engineered fluorescent 
protein donor/acceptor pair are part of a fusion protein coupled by a peptide having a 
proteolytic cleavage site. Such tandem fluorescent proteins are described in United States 
patent application 08/594,575, filed January 31, 1996. 
5 In another aspect, FRET is used to detect changes in potential across a 

membrane. A donor and acceptor are placed on opposite sides of a membrane such that one 
translates across the membrane in response to a voltage change. This creates a measurable 
FRET. Such a method is described in United States patent application 08/481,977, filed 
June 7, 1995 and International Application PCT/US 96/09652, filed June 6, 1996. 

1 o The engineered protein of this invention are useful in the creation of 

fluorescent substrates for protein kinases. Such substrates incorporate an amino acid 
sequence recognizable by protein kinases. Upon phosphorylation, the engineered 
fluorescent protein undergoes a change in a fluorescent property. Such substrates are useful 
in detecting and measuring protein kinase activity in a sample of a cell, upon transfection 

1 5 and expression of the substrate. Preferably, the kinase recognition site is placed within 
about 20 amino acids of a terminus of the engineered fluorescent protein. The kinase 
recognition site also can be placed in a loop domain of the protein. (See, e.g. Figure IB.) 
Methods for making fluorescent substrates for protein kinases are described in United States 
patent application 08/680,877, filed July 16, 1996. 

20 A protease recognition site also can be introduced into a loop domain. Upon 

cleavage, fluorescent property changes in a measurable fashion. 
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The invention also includes a method of identifying a test chemical. Typically, the 
method includes contacting a test chemical a sample containing a biological entity labeled 
with a functional, engineered fluorescent protein or a polynucleotide encoding said 
functional, engineered fluorescent protein. By monitoring fluorescence (i.e. a fluorescent 
5 property) from the sample containing the functional engineered fluorescent protein it can be 
determined whether a test chemical is active. Controls can be included to insure the 
specificity of the signal. Such controls include measurements of a fluorescent property in 
the absence of the test chemical, in the presence of a chemical with an expected activity 
(e.g., a known modulator) or engineered controls (e.g., absence of engineered fluorescent 
1 0 protein, absence of engineered fluorescent protein polynucleotide or the absence of operably 
linkage of the engineered fluorescent protein). 

The fluorescence in the presence of a test chemical can be greater or less than in the 
absence of said test chemical. For instance if the engineered fluorescent protein is used a 
reporter of gene expression, the test chemical may up or down regulate gene expression. 
1 5 For such types of screening, the polynucleotide encoding the functional, engineered 

fluorescent protein is operatively linked to a genomic polynucleotide or a re. Alternatively, 
the functional, engineered fluorescent protein is fused to second functional protein. This 
embodiment can be used to track localization of the second protein or to track protein- 
protein interactions using energy transfer. 

20 

TV. PROCEDURES 

Fluorescence in a sample is measured using a fluorimeter. In general, 
excitation radiation from an excitation source having a first wavelength, passes through 
excitation optics. The excitation optics cause the excitation radiation to excite the sample. 

2 5 In response, fluorescent proteins in the sample emit radiation which has a wavelength that is 

different from the excitation wavelength. Collection optics then collect the emission from 
. the sample. The device can include a temperature controller to maintain the sample at a 
specific temperature while it is being scanned. According to one embodiment, a multi-axis 
translation stage moves a microtiter plate holding a plurality of samples in order to position 

3 0 different wells to be exposed. The multi-axis translation stage, temperature controller, auto- 

focusing feature, and electronics associated with imaging and data collection can be 
managed by an appropriately programmed digital computer. The computer also can 



WO 98/06737 PCT/US97/ 14593 

40 

transform the data collected during the assay into another format for presentation. This 
process can be miniaturized and automated to enable screening many thousands of 
compounds. 

Methods of performing assays on fluorescent materials are well known in the 
art and are described in, e.g., Lakowicz, J.R., Principles of Fluorescence Spectroscopy, New 
York:Plenum Press (1983); Herman, B., Resonance energy transfer microscopy, in: 
Fluorescence Microscopy of Living Cells in Culture, Part B, Methods in Cell Biology, vol. 
30, ed. Taylor, D.L. & Wang, Y.-L., San Diego: Academic Press (1989), pp. 219-243; 
Turro, N.J., Modern Molecular Photochemistry, Menlo Park: Benjamin/Cummings 
Publishing Col, Inc. (1978), pp. 296-361. 

The following examples are provided by way of illustration, not by way of 

limitation, 

EXAMPLES 

As a step in understanding the properties of GFP, and to aid in the tailoring 
of GFPs with altered characteristics, we have determined the three dimensional structure at 
1.9 A resolution of the S65T mutant (R. Heim et al. Nature 373:664-665 (1995)) of A. 
victoria GFP. This mutant also contains the ubiquitous Q80R substitution, which 
accidentally occurred in the early distribution of the GFP cDNA and is not known to have 
any effect on the protein properties (M. Chalfie et al. Science 263:802-805 (1994)). 

Histidine-tagged S65T GFP (R. Heim et al. Nature 373:664-665 (1995)) was 
overexpressed in JM109/pRSET B in 4 1 YT broth plus ampicillin at 37 □, 450 rpm and 5 
1/min air flow. The temperature was reduced to 25 □ at A 595 = 0.3, followed by induction 
with ImM isopropylthiogalactoside for 5h. Cell paste was stored at -80 □ overnight, then 
was resuspended in 50 mM HEPES pH 7.9, 0.3 M NaCl, 5 raM 2-mercaptoethanol, 0.1 mM 
phenylmethyl-sulfonylfluoride (PMSF), passed once through a French press at 10,000 psi, 
then centrifuged at 20 K rpm for 45 min. The supernatant was applied to a Ni-NTA-agarose 
column (Qiagen), followed by a wash with 20 mM imidazole, then eluted with 100 mM 
imidazole. Green fractions were pooled and subjected to chymotryptic (Sigma) proteolysis 
(1:50 w/w) for 22 h at RT. After addition of 0.5 mM PMSF, the digest was reapplied to the 
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Ni column. N-terminal sequencing verified the presence of the correct N-terminal 
methionine. After dialysis against 20 mM HEPES, pH 7.5 and concentration to A 490 = 20, 
rod-shaped crystals were obtained at RT in hanging drops containing 5 Dl protein and 5 □! 
well solution, 22-26% PEG 4000 (Serva), 50 mM HEPES pH 8.0-8.5, 50 mM MgCl 2 and 10 
5 mM 2-mercapto-ethanol within 5 days. Crystals were 0.05 mm across and up to 1.0 mm 
long. The space group is P2,2,2 l with a = 51.8, b = 62.8, c = 70.7 A, Z=4. Two crystal 
forms of wild-type GFP, unrelated to the present form, have been described by M. A. 
Perrozo, K. B. Ward, R. B. Thompson, & W. W. Ward. J. Biol. Chem. 203, 7713-7716 
(1988). 

1 0 The structure of GFP was determined by multiple isomorphous replacement 

and anomalous scattering (Table E), solvent flattening, phase combination and 
crystallographic refinement. The most remarkable feature of the fold of GFP is an eleven 
stranded 6-barrel wrapped around a single central helix (Fig. 1A and IB), where each strand 
consists of approximately 9-13 residues. The barrel forms a nearly perfect cylinder 42 A 

1 5 long and 24 A in diameter. The N-terminal half of the polypeptide comprises three anti- 
parallel strands, the central helix, and then 3 more anti-parallel strands, the latter of which 
(residues 1 18-123) is parallel lo the N-terminal strand (residues 1 1-23). The polypeptide 
backbone then crosses the "bottom" of the molecule to form the second half of the barrel in 
a five-strand Greek Key motif. The top end of the cylinder is capped by three short, 

2 0 distorted helical segments, while one short, very distorted helical segment caps the bottom 

of the cylinder. The main-chain hydrogen bonding lacing the surface of the cylinder very 
likely accounts for the unusual stability of the protein towards denaturation and proteolysis. 
There are no large segments of the polypeptide that could be excised while preserving the 
intactness of the shell around the chromophore. Thus it would seem difficult to re-engineer 
25 GFP to reduce its molecular weight (J. Dopf & T.M. Horiagon Gene 173:39-43 (1996)) by a 
large percentage. 

The p-hydroxybenzylideneimidazolidinone chromophore (C. W. Cody et al. 
Biochemistry 32:1212-1218 (1993)) is completely protected from bulk solvent and centrally 
located in the molecule. The total and presumably rigid encapsulation is probably 

3 0 responsible for the small Stokes' shift (i.e. wavelength difference between excitation and 

emission maxima), high quantum yield of fluorescence, inability of O, to quench the excited 
state (B.D. Nageswara Rao et al. Biophys. J. 32:630-632 (1980)), and resistance of the 
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chromophore to titration of the external pH (W. W. Ward. Bioluminescence and 
Chemiluminescence (M. A. DeLuca and W. D. McElroy, eds) Academic Press pp. 235-242 
(1981); W. W. Ward & S. H. Bokman. Biochemistry 21:4535-4540 (1982); W. W. Ward et 
al. Photochem. Photobiol. 35:803-808 (1982)). It also allows one to rationalize why 
fluorophore formation should be a spontaneous intramolecular process (R. Heim et al. Proc. 
Natl. Acad. Set. USA 91:12501-12504 (1994)), as it is difficult to imagine how an enzyme 
could gain access to the substrate. The plane of the chromophore is roughly perpendicular 
(60 □) to the symmetry axis of the surrounding barrel. One side of the chromophore faces a 
surprisingly large cavity, that occupies a volume of approximately 135 A J (B. Lee & F. M. 
Richards. J. Mol. Biol. 55:379-400 (1971)). The atomic radii were those of Lee & Richards, 
calculated using the program MS with a probe radius of 1.4 A. (M. L. Connolly, Science 
221:709-713 (1983)). The cavity does not open out to bulk solvent. Four water molecules 
are located in the cavity, forming a chain of hydrogen bonds linking the buried side chains 
of Glu 2 " and Gin 69 . Unless occupied, such a large cavity would be expected to de-stabilize 
the protein by several kcal/mol (S. J. Hubbard et al., Protein Engineering 7:613-626 (1994); 
A. E. Eriksson et al. Science 255:178-183 (1992)). Part of the volume of the cavity might 
be the consequence of the compaction resulting from cyclization and dehydration reactions. 
The cavity might also temporarily accommodate the oxidant, most likely 0 2 (A. B. Cubitt 
et al. Trends Biochem. Sci. 20:448-455 (1995); R. Heim et al. Proc. Natl. Acad. Sci. USA 
91:12501-12504 (1994); S. Inouye & F.I. Tsuji. FEBS Lett. 351:211-214 (1994)), that 
dehydrogenates the bond of Tyr 66 . The chromophore, cavity, and side chains that 
contact the chromophore are shown in Figure 2A and a portion of the final electron density 
map in this vicinity in 2B. 

The opposite side of the chromophore is packed against several aromatic and 
polar side chains. Of particular interest is the intricate network of polar interactions with the 
chromophore (Fig. 2C). His 148 , Thr 201 and Ser 205 form hydrogen bonds with the phenolic 
hydroxyl; Arg 96 and Gin 94 interact with the carbonyl of the imidazolidinone ring and Glu 222 
forms a hydrogen bond with the side chain of Thr 65 . Additional polar interactions, such as 
hydrogen bonds to Arg 96 from the carbonyl of Thr 62 , and the side-chain carbonyl of Gin 183 , 
presumably stabilize the buried Arg 96 in its protonated form. In turn, this buried charge 
suggests that a partial negative charge resides on the carbonyl oxygen of the 
imidazolidinone ring of the deprotonated fluorophore, as has previously been suggested (W. 
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W. Ward. Bioluminescence and Chemiluminescence (M. A. DeLuca and W. D. McElroy, 
eds) Academic Press pp. 235-242 (1981); W. W. Ward & S. H. Bokman. Biochemistry 
21:4535-4540 (1982); W. W. Ward et al. Photochem. Photobiol. 35:803-808 (1982)). Arg 96 
is likely to be essential for the formation of the fluorophore, and may help catalyze the 
5 initial ring closure. Finally, Tyr 145 shows a typical stabilizing edge-face interaction with the 
benzyl ring. Trp 57 , the only tryptophan in GFP, is located 13 A to 15 A from the 
chromophore and the long axes of the two ring systems are nearly parallel. This indicates 
that efficient energy transfer to the latter should occur, and explains why no separate 
tryptophan emission is observable (D.C. Prasher et al. Gene 1 1 1:229-233 (1992). The two 

10 cysteines in GFP, Cys 48 and Cys 70 , are 24 A apart, too distant to form a disulfide bridge. 
Cys 70 is buried, but Cys 48 should be relatively accessible to sulfhydryl-specific reagents. 
Such a reagent, 5,5'-dithiobis(2-nitrobenzoic acid), is reported to label GFP and quench its 
fluorescence (S. Inouye & F.I. Tsuji FEBS Lett. 351:21 1-214 (1994)). This effect was 
attributed to the necessity for a free sulfhydryl, but could also reflect specific quenching by 

1 5 the 5-thio-2-nitrobenzoate moiety that would be attached to Cys 48 . 

Although the electron density map is for the most part consistent with the 
proposed structure of the chromophore (D.C. Prasher et al. Gene 1 1 1:229-233 (1992); C. W. 
Cody et al. Biochemistry 32:1212-1218 (1993)) in the cis [Z-] configuration, with no 
evidence for any substantial fraction of the opposite isomer around the chromophore double 

2 0 bond, difference features are found at >4 □ in the final (F 0 -F c ) electron density map that can 
be interpreted to represent either the intact, uncyclized polypeptide or a carbinolamine (inset 
to Fig. 2C). This suggests that a significant fraction, perhaps as much as 30% of the 
molecules in the crystal, have failed to undergo the final dehydration reaction. 
Confirmation of incomplete dehydration comes from electrospray mass spectrometry, which 

2 5 consistently shows that the average masses of both wild-type and S65T GFP (3 1 ,086±4 and 

31,099.5±4 Da, respectively) are 6-7 Da higher than predicted (31,079 and 31,093 Da, 
respectively) for the fully matured proteins. Such a discrepancy could be explained by a 30- 
35% mole fraction of apoprotein or carbinolamine with 18 or 20 Da higher molecular 
weight The natural abundance of l3 C and 2 H and the finite resolution of the Hewlett-Packard 

3 0 5989B electrospray mass spectrometer used to make these measurements do not permit the 

individual peaks to be resolved, but instead yields an average mass peak with a full width at 
half maximum of approximately 15 Da. The molecular weights shown include the His-tag, 
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which has the sequence MRGSHHHHHH GMASMTGGQQM GRDLYDDDDK DPPAEF 
(SEQ ID NO:5). Mutants of GFP that increase the efficiency of fluorophore maturation 
might yield somewhat brighter preparations. In a model for the apoprotein, the Thr 65 -Tyr 66 
peptide bond is approximately in the D-helical conformation, while the peptide of Tyr 66 - 
5 Gly* 7 appears to be tipped almost perpendicular to the helix axis by its interaction with 
Arg*'. This further supports the speculation that Arg 96 is important in generating the 
conformation required for cyclization, and possibly also for promoting the attack of Gly 67 on 
the carbonyl carbon of Thr 65 (A. B. Cubitt et al. Trends Biochem. Sci. 20:448-455 (1995)). 

The results of previous random mutagenesis have implicated several amino 

1 0 acid side chains to have substantial effects on the spectra and the atomic model confirms 
that these residues are close to the chromophore. The mutations T203I and E222G have 
profound but opposite consequences on the absorption spectrum (T. Ehrig et al. FEBS 
Letters 367:163-166 (1995)). T203I (with wild-type Ser 65 ) lacks the 475 nm absorbance 
peak usually attributed to the anionic chromophore and shows only the 395 nm peak 

1 5 thought to reflect the neutral chromophore (R. Heim et al. Proc. Natl. Acad. Sci. USA 

91:12501-12504 (1994); T. Ehrig et al. FEBS Letters 367:163-166 (1995)). Indeed, Thr 203 is 
hydrogen-bonded to the phenolic oxygen of the chromophore, so replacement by He should 
hinder ionization of the phenolic oxygen. Mutation of Glu 222 to Gly (T. Ehrig et al. FEBS 
Letters 367:163-166 (1995)) has much the same spectroscopic effect as replacing Ser 65 by 

2 0 Gly, Ala, Cys, Val, or Thr, namely to suppress the 395 nm peak in favor of a peak at 470- 
490 nm (R. Heim et al. Nature 373:664-665 (1995); S. Delagrave et al. Bio/Technology 
13:151-154 (1995)). Indeed Glu 222 and the remnant of Thr 65 are hydrogen-bonded to each 
other in the present structure, probably with the uncharged carboxyl of Glu 222 acting as 
donor to the side chain oxygen of Thr 65 . Mutations E222G, S65G, S65A, and S65V would 

2 5 all suppress such H-bonding. To explain why only wild-type protein has both excitation 
peaks, Ser 65 , unlike Thr* 5 , may adopt a conformation in which its hydroxyl donates a 
hydrogen bond to and stabilizes Glu 222 as an anion, whose charge then inhibits ionization of 
the chromophore. The structure also explains why some mutations seem neutral. For 
example, Gin 80 is a surface residue far removed from the chromophore, which explains why 

30 its accidental and ubiquitous mutation to Arg seems to have no obvious intramolecular 
spectroscopic effect (M. Chalfie et al. Science 263:802-805 (1994)). 

The development of GFP mutants with red-shifted excitation and emission 
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maxima is an interesting challenge in protein engineering' (A. B. Cubitt et al. Trends 
Biochem. Sci. 20:448-455 (1995); R. Heim et al. Nature 373:664-665 (1995); S. Delagrave 
et al. Bio/Technology 13:151-154 (1995)). Such mutants would also be valuable for 
avoidance of cellular autofluorescence at short wavelengths, for simultaneous multicolor 
5 reporting of the activity of two or more cellular processes, and for exploitation of 

fluorescence resonance energy transfer as a signal of protein-protein interaction (R. Heim & 
R.Y. Tsien. Current Biol. 6:178-182 (1996)). Extensive attempts using random 
mutagenesis have shifted the emission maximum by at most 6 nm to longer wavelengths, to 
5 1 4 nm (R. Heim & R.Y. Tsien. Current Biol. 6:178-182 (1996)); previously described 

10 "red-shifted" mutants merely suppressed the 395 nm excitation peak in favor of the 475 nm 
peak without any significant reddening of the 505 nm emission (S. Delagrave et al. 
Bio/Technology 1 3 : 1 5 1 - 1 54 ( 1 995)). Because Thr 203 i s revealed to be adjacent to the 
phenolic end of the chromophore, we mutated it to polar aromatic residues such as His, Tyr, 
and Tip in the hope that the additional polarizability of their □ systems would lower the 

1 5 energy of the excited state of the adjacent chromophore. All three substitutions did indeed 
shift the emission peak to greater than 520 nm (Table F). A particularly attractive mutation 
was T203Y/S65G/V68L/S72A, with excitation and emission peaks at 513 and 527 nm 
respectively. These wavelengths are sufficiently different from previous GFP mutants to be 
readily distinguishable by appropriate filter sets on a fluorescence microscope. The 

2 0 extinction coefficient, 36,500 IVT'cm"', and quantum yield, 0.63, are almost as high as those 
of S65T (R. Heim et al. Nature 373:664-665 (1995)). 

Comparison of Aequorea GFP with other protein pigments is instructive. 
Unfortunately, its closest characterized homolog, the GFP from the sea pansy Renilla 
reniformis (O. Shimomura and F.H. Johnson J. Cell. Comp. Physiol. 59:223 (1962); J. G- 

2 5 Morin and J. W. Hastings, J. Cell. Physiol. 77:313 (1971); H. Morise et al. Biochemistry 

1 3:2656 (1974); W. W. Ward Photochem. Photobiol. Reviews (Smith, K. C. ed.) 4:1 (1979); 
W. W. Ward. Bioluminescence and Chemiluminescence (M. A. DeLuca and W. D. 
McElroy, eds) Academic Press pp. 235-242 (1981); W. W. Ward & S. H. Bokman 
Biochemistry 21:4535-4540 (1982); W. W. Ward et al. Photochem. Photobiol. 35:803-808 

3 0 (1 982)), has not been sequenced or cloned, though its chromophore is derived from the 

same FSYG sequence as in wild-type Aequorea GFP (R. M. San Pietro et al. Photochem. 
Photobiol. 57:63S (1993)). The closest analog for which a three dimensional structure is 
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available is the photoactive yellow protein (PYP, G. E. O. Borgstahl et al. Biochemistry 
34:6278-6287 (1995)), a 14-kDa photoreceptor from halophilic bacteria. PYP in its native 
dark state absorbs maximally at 446 nm and transduces light with a quantum yield of 0.64, 
rather closely matching wild-type GFP's long wavelength absorbance maximum near 475 
5 nm and fluorescence quantum yield of 0.72-0.85. The fundamental chromophore in both 
proteins is an anionic p-hydroxycinnamyl group, which is covalently attached to the protein 
via a thioester linkage in PYP and a heterocyclic iminolactam in GFP. Both proteins 
stabilize the negative charge on the chromophore with the help of buried cationic arginine 
and neutral glutamic acid groups, Arg" and Glu 4 * in PYP and Arg 96 and Glu 222 in GFP, 

10 though in PYP the residues are close to the oxyphenyl ring whereas in GFP they are nearer 
the carbonyl end of the chromophore. However, PYP has an overall □/□ fold with 
appropriate flexibility and signal transduction domains to enable it to mediate the cellular 
phototactic response, whereas GFP is a much more regular and rigid □-barrel to minimize 
parasitic dissipation of the excited state energy as thermal or conformational motions. GFP 

15 is an elegant example of how a visually appealing and extremely useful function, efficient 
fluorescence, can be spontaneously generated from a cohesive and economical protein 
structure. 

A. Summary Of GFP Structure Determination 

2 0 Data were collected at room temperature in house using either Molecular 

Structure Corp. R-axis II or San Diego Multiwire Systems (SDMS) detectors (Cu KC3) and 
later at beamline X4A at the Brookhaven National Laboratory at the selenium absorption 
edge (□ = 0.979 A) using image plates. Data were evaluated using the HKL package (Z. 
Otwinowski, in Proceedings of the CCP4 Study Weekend: Data Collection and Processing, 
25 L. Sawyer, N. Issacs, S. Bailey, Eds. (Science and Engineering Research Council (SERC), 
Daresbury Laboratory, Warrington, UK, (1991)), pp 56-62; W. Minor, XDISPLAYF 
(Purdue University, West Lafayette, IN, 1993)) or the SDMS software (A. J. Howard et al. 
Meth. Enzymol. 1 14:452-471 (1985)). Each data set was collected from a single crystal. 
Heavy atom soaks were 2 mM in mother liquor for 2 days. Initial electron density maps 

3 0 were based on three heavy atom derivatives using in-house data, then later were replaced 

with the synchrotron data. The EMTS difference Patterson map was solved by inspection, 
then used to calculate difference Fourier maps of the other derivatives. Lack of closure 
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refinement of the heavy atom parameters was performed using the Protein package (W. 
Steigemann, in Ph.D. Thesis (Technical University, Munich, 1974)). The MIR maps were 
much poorer than the overall figure of merit would suggest, and it was clear that the EMTS 
isomorphous differences dominated the phasing. The enhanced anomalous occupancy for 
5 the synchrotron data provided a partial solution to the problem. Note that the phasing power 
was reduced for the synchrotron data, but the figure of merit was unchanged. All 
experimental electron density maps were improved by solvent flattening using the program 
DM of the CCP4 (CCP4: A Suite of Programs for Protein Crystallography (SERC 
Daresbury Laboratory, Warrington WA4 4AD UK, 1 979)) package assuming a solvent 

10 content of 38%. Phase combination was performed with PHASC02 of the Protein package 
using a weight of 1 .0 on the atomic model. Heavy atom parameters were subsequently 
improved by refinement against combined phases. Model building proceeded with FRODO 
and O (T. A. Jones et al. Acta. Crystallogr. Sect. A 47:1 10 (1991); T. A. Jones, in 
Computational Crystallography D. Sayre, Ed. (Oxford University Press, Oxford, 1982) pp. 

1 5 303-3 17) and crystallographic refinement was performed with the TNT package (D. E. 
Tronrud et al. Acta Cryst. A 43:489-503 (1987)). Bond lengths and angles for the 
chromophore were estimated using CHEM3D (Cambridge Scientific Computing). Final 
refinement and model building was performed against the X4A selenomethione data set, 
using (2F 0 -F C ) electron density maps. The data beyond 1.9 A resolution have not been used 

20 at this stage. The final model contains residues 2-229 as the terminal residues are not 
visible in the electron density map, and the side chains of several disordered surface 
residues have been omitted. Density is weak for residues 156-158 and coordinates for these 
residues are unreliable. This disordering is consistent with previous analyses showing that 
residues 1 and 233-238 are dispensible but that further truncations may prevent fluorescence 

2 5 (J. Dopf & T.M. Horiagon. Gene 1 73:39-43 (1996)). The atomic model has been deposited 
in the Protein Data Bank (access code 1EMA). 
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Crystal Resoluti Total 

on (A) obs 

R-axix II 

Native 2.0 51907 

EMTS e 2.6 17727 

SeMet 2.3 44975 
Multiwire 

HGI4-Se 3.0 15380 
X4a 

SeMet 1.8 126078 

EMTS 2.3 57812 
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Table E 

Diffraction Data Statistics 

Unique Compl. Compl. 

obs <%l (shell)" 

13582 80 69 

6787 87 87 

10292 92 88 

4332 84 79 

19503 80 55 

9204 82 66 
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Rmerge Riso 

4.1 5.8 
5.7 20.6 
10.2 9.3 

7.2 28.8 

9.3 9.4 
7.2 26.3 
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Derivative Resolution 

m — 



In House 




EMTS 


3.0 


SeMet 


3.0 


HGI4-Se 


3.0 


X4a 




EMTS 


3.0 


SeMet 


3.0 



Atomic Model Statistics 

Protein atoms 
5 Solvent atoms 

Resol. range (A) 

Number of reflections (F 

Completeness 

R. factor** 
10 Mean B -value (A 2 ) 

Deviations from ideality 

Bond lengths (A) 

Bond angles (□) 

Restrained B-values (A 2 ) 
15 Ramachandran outliers 

Notes'. 



Phasing Statistics 



Number Phasing Phasing FOM g FOM 

of sites power f Power(shell) (shell) 

2 2.08 2.08 0.77 .072 

4 1.66 1.28 

9 1.77 1.90 

2 1.36 1.26 0.77 .072 

4 1.31 1.08 



1790 

94 

20-1.9 

0)17676 

84. 

0.175 

24.1 

0.014 
1.9 
4.3 
0 
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(a) Completeness is the ratio of observed reflections to theoretically possible expressed 
as a percentage. 

(b) Shell indicates the highest resolution shell, typically 0.1-0.4 A wide. 

(c) Rmerge = □ |I - <I>| / □ I, where <I> is the mean of individual observations of 
5 intensities I. 

(d) Riso = □ |I DER - I NAT | / □ I NAT 

(e) Derivatives were EMTS=ethymercurithiosalicylate (residues modified Cys 48 and 
Cys 70 ), SeMet=selenomethionine substituted protein (Met 1 and Met 233 could not be 
located); HgI«-SeMet = double derivative Hgl 4 on SeMet background. 

10 (f) Phasing power = <F H >/<E> where <F H >=r.m.s. heavy atom scattering and <E>=lack 
of closure. 

(g) FOM, mean figure of merit 

(h) Standard crystallographic R-factor, R = □ l|F obs | - jF^Jj / □ |F,J 



15 B. Spectral properties of Thr 03 ("T2Q3") mutants compared to S65T 

The mutations F64L, V68L and S72A improve the folding of GFP at 37 □ 
(B. P. Cormack et al. Gene 173:33 (1996)) but do not significantly shift the emission 
spectra. 

TABLE F 

Clone Mutations Excitation Extinction Emission 

max.(nm) coefficient max.(nm) 
(lO^W) 



S65T 


S65T 


489 


39.2 


511 


5B 


T203H/S65T 


512 


19.4 


524 


6C 


T203Y/S65T 


513 


14.5 


525 


10B 


T203Y/F64L/S65G/S72A 


513 


30.8 


525 


10C 


T203Y/F65G/V68L/S72A 


513 


36.5 


527 


11 


T203W/S65G/S72A 


502 


33.0 


512 
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12H 



T203Y/S65G/S72A 



513 



36.5 



527 



20A 



T203Y/S65G/V68L/Q69K/S72A 



515 



46.0 



527 



The present invention provides novel long wavelength engineered 



fluorescent proteins. While specific examples have been provided, the above description is 
illustrative and not restrictive. Many variations of the invention will become apparent to 
5 those skilled in the an upon review of this specification. The scope of the invention should, 
therefore, be determined not with reference to the above description, but instead should be 
determined with reference to the appended claims along with their full scope of equivalents. 

All publications and patent documents cited in this application are 
10 incorporated by reference in their entirety for all purposes to the same extent as if each 
individual publication or patent document were so individually denoted. 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ATG AGT AAA. GGA GAA GAA CTT TTC ACT GCA GTT GTC CCA ATT CTT GTT 
Met Ser Lys Gly Glu Glu Leu Phe Thr Ala Val Val Pro lie Leu Val 
15 10 IS 

GAA TTA GAT GGT GAT GTT AAT GGG CAC AAA TTT TCT GTC AGT GGA GAG 
Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 



ACT ACT GGA AAA CTA CCT GTT CCA TGG CCA ACA CTT GTC ACT ACT TTC 

Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 

TCT TAT GGT GTT CAA TGC TTT TCA AGA TAC CCA GAT CAT ATG AAA CGG 

Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Arg 



ACT ATA TTT TTC AAA GAT GAC GGG AAC TAC AAG ACA CGT GCT GAA GTC 

Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 

100 105 110 

AAG TTT GAA GGT GAT ACC CTT GTT AAT AGA ATC GAG TTA AAA GGT ATT 

Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly He 
115 120 125 

GAT TTT AAA GAA GAT GGA AAC ATT CTT GGA CAT AAA TTG GAA TAC AAC 

Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

TAT AAC TCA CAC AAT GTA TAC ATC ATG GCA GAC AAA CAA AAG AAT GGA 

Tyr Asn Ser His Asn 'Val Tyr He Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

ATC AAA GTT AAC TTC AAA ATT AGA CAC AAC ATT GAA GAT GGA AGC GTT 

He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser Val 
165 170 175 

CAA CTA GCA GAC TAT TAT CAA CAA AAT ACT CCA ATT CT.C GAT GGC CCT 

Gin Leu Ala Asp Tyr Tyr Gin Gin Asn Thr Pro He Leu Asp Gly Pro 

180 185 190 

GTC CTT TTA CCA GAC AAC CAT TAC CTG TCC ACA CAA TCT GCC CTT TCG 

Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

AAA GAT CCC AAC GAA AAG AGA GAC CAC ATG GTC CTT CTT GAG TTT GTA 

Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

ACA GCT GCT GGG ATT ACA CAT GGC ATG GAT GAA CTA TAC AAA 

Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



(2) INFORMATION FOR SEQ ID NO: 2: 

(il SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 238 amino acids 

(B) TYPE; amino acid 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ser Lvs Gly Glu Glu Leu Phe Thr Ala Val Val Pro lie Leu Val 
1 " 5 10 15 

Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

Gly Glu Gly Asp Val Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie Cys 
35 40 45 

Thr Thr Glv Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 ' 55 60 

Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Arg 
65 70 75 80 

His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Gin Arg 



Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 



Lys Asp Pro Aan Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



( ix) FEATURE ; 

(A> NAME /KEY : CDS 
(B) LOCATION: 1 . .720 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

ATG GTG AGC AAG GGC GAG GAG CTG TTC ACC GGG GTG GTG CCC ATC CTG 48 

Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu 
240 245 250 

GTC GAG CTG GAC GGC GAC GTA AAC GGC CAC AAG TTC AGC GTG TCC GGC 96 

Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
255 260 265 270 

GAG GGC GAG GGC GAT GCC ACC TAC GGC AAG CTG ACC CTG AAG TTC ATC 144 

Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie 

275 280 285 

TGC ACC ACC GGC AAG CTG CCC GTG CCC TGG CCC ACC CTC GTG ACC ACC 192 

Cya Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
290 295 300 

TTC GGC TAC GGC GTG CAG TGC TTC GCC CGC TAC CCC GAC CAC ATG AAG 240 

Phe Gly Tyr Gly Val Gin Cys Phe Ala Arg Tyr Pro Asp His Met Lys 
305 310 315 

CAG CAG GAC TTC TTC AAG TCC GCC ATG CCC GAA GGC TAC GTC CAG GAG 288 

Gin Gin Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
320 325 330 

CGC ACC ATC TTC TTC AAG GAC GAC GGC AAC TAC AAG ACC CGC GCC GAG 336 

Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
335 340 345 350 

GTG AAG TTC GAG GGC GAC ACC CTG GTG AAC CGC ATC GAG CTG AAG GGC 384 

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 

355 360 36S 

ATC GAC TTC AAG GAC GAC GGC AAC ATC CTG GGG CAC AAG CTG GAG TAC 432 

He Asp Phe Lys Asp Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
370 375 360 

AAC TAC AAC AGC CAC AAC GTC TAT ATC ATG GCC GAC AAG CAG AAG AAC 480 

Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
385 390 395 

GGC ATC AAG GTG AAC TTC AAG ATC CGC CAC AAC ATC GAG GAC GGC AGC 52 8 

Gly He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 
400 405 410 

GTG CAG CCC GCC GAC CAC TAC CAG CAG AAC ACC CCC ATC GGC GAC GGC 576 

Val Gin Pro Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
415 420 425 430 

CCC GTG CTG CTG CCC GAC AAC CAC TAC CTG AGC TAC CAG TCC GCC CTG 624 

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Tyr Gin Ser Ala Leu 

435 440 445 

AGC AAA GAC CCC AAC GAG AAG CGC GAT CAC ATG GTC CTG CTG GAG TTC 67 2 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
450 455 460 

GTG ACC GCC GCC GGG ATC ACT CAC GGC ATG GAC GAG CTG TAC AAG TAA 720 

Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys * 
465 470 475 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH -. 240 amino a< 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu 

15 10 15 

Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
20 25 30 

Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie 
35 40 45 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 

Phe Gly Tyr Gly Val Gin Cys Phe Ala Arg Tyr Pro Asp His Met Lys 
65 70 75 80 

Gin Gin Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
85 90 95 

Arg Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
100 105 110 

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly 
115 120 125 

lie Asp Phe Lys Asp Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr 
130 135 140 

Asn Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn 
145 150 155 160 

Gly He Lys Val Asn Phe Lys lie Arg His Asn He Glu Asp Gly Ser 
165 170 175 

Val Gin Pro Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
180 185 190 

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Tyr Gin Ser Ala Leu 
195 200 205 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys * 
225 230 235 240 



WO 98/06737 

WHAT IS CLAIMED IS : 



57 



PCT/US97/14593 



1 

2 1 . A nucleic acid molecule comprising a nucleotide sequence encoding 

3 a functional engineered fluorescent protein whose amino acid sequence is substantially 

4 identical to the amino acid sequence of Aequorea green fluorescent protein (SEQ ID NO:2) 

5 and which differs from SEQ ID NO:2 by at least the substitution T203X, wherein X is an 

6 aromatic amino acid selected from H, Y, W or F, said functional engineered fluorescent 

7 protein having a different fluorescent property than Aequorea green fluorescent protein. 
8 

1 2. The nucleic acid molecule of claim 1 wherein the amino acid 

2 sequence further comprises a substitution at S65, wherein the substitution is selected from 

3 S65G, S65T, S65A, S65L, S65C, S65V and S65I. 

1 

1 3. The nucleic acid molecule of claim 1 wherein the amino acid 

2 sequence differs by no more than the substitutions S65T/T203H; S65T/T203Y; 

3 S72A/F64L/S65G/T203Y; S72A/S65G/V68LAT203Y; S65G/V68L/Q69K/S72A/T203Y; 

4 S65G/S72A/T203Y; or S65G/S72A/T203W. 

1 4. The nucleic acid molecule of claim 1 or 2 wherein the amino acid 

2 sequence further comprises a substitution at Y66, wherein the substitution is selected from. 

3 Y66H, Y66F, and Y66W. 

1 5. The nucleic acid molecule of claim 1 or 2 wherein the amino acid 

2 sequence further comprises a mutation from Table A. 



1 6. The nucleic acid molecule of claim 1 or 2 wherein the amino acid 

2 sequence further comprises a folding mutation. 
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1 7. The nucleic acid molecule of any of claims 1 -3 wherein the 

2 nucleotide sequence encoding the protein differs from the nucleotide sequence of SEQ ID 

3 NO: 1 by the substitution of at least one codon by a preferred mammalian codon. 

1 8. The nucieic acid molecule of any of claims 1-3 encoding a fusion 

2 protein wherein the fusion protein comprises a polypeptide of interest and the functional 

3 engineered fluorescent protein. 

1 9. An expression vector comprising expression control sequences 

2 operatively linked to a nucleic acid molecule comprising a nucleotide sequence encoding a 

3 functional engineered fluorescent protein whose amino acid sequence is substantially 

4 identical to the amino acid sequence of Aequorea green fluorescent protein (SEQ ID NO:2) 

5 and which differs from SEQ ID NO:2 by at least the amino acid substitution T203X, 

6 wherein X is an aromatic amino acid selected from H, Y, W or F, said functional engineered 

7 fluorescent protein having a different fluorescent property than Aequorea green fluorescent 

8 protein. 

1 10. The expression vector of claim 9 wherein the amino acid sequence 

2 further comprises a substitution at S65, wherein the substitution is selected from S65G, 

3 S65T, S65A, S65L, S65C, S65V and S65I. * 

1 11. The expression vector of claim 9 wherein the amino acid sequence 

2 differs by no more than the substitutions S65T/T203H; S65T/T203Y; 

3 S72A/F64L/S65GH-203Y; S72A/S65G/V68L/T203Y; S65G/V68L/Q69K;S72A/T203Y, 

4 S65G/S72A/T203Y; or S65G/S72A/T203W. 

1 12. The expression vector of claim 10 or 1 1 wherein the amino acid 

2 sequence further comprises a substitution at Y66, wherein the substitution is selected from 

3 Y66H, Y66F, and Y66W. 
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1 13. The expression vector of claim 1 0 or 1 1 wherein the amino acid 

2 sequence further comprises a mutation from Table A. 

3 14. The expression vector of claim 9 or 1 0 wherein the amino acid 

4 sequence further comprises a folding mutation. 

1 15. The expression vector of any of claims 9-1 1 wherein the nucleotide 

2 sequence encoding the protein differs from the nucleotide sequence of SEQ ED NO:l by the 

3 substitution of at least one codon by a preferred mammalian codon. 

1 16. The expression vector of any of claims 9-1 1 encoding a fusion 

2 protein wherein the fusion protein comprises a polypeptide of interest and the functional 

3 engineered fluorescent protein, 

1 17. A recombinant host cell comprising an expression vector that 

2 comprises expression control sequences operative ly linked to a nucleic acid molecule 

3 comprising a nucleotide sequence encoding a functional engineered fluorescent protein 

4 whose amino acid sequence is substantially identical to the amino acid sequence of 

5 Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 

6 by at least the amino acid substitution T203X, wherein X is an aromatic amino acid selected 

7 from H, Y, W or F, said functional engineered fluorescent protein having a different 

8 fluorescent property than Aequorea green fluorescent protein. 



1 
2 
3 



18. The recombinant host cell of claim 1 7 wherein the amino acid 
sequence further comprises a substitution at S65, wherein the substitution is selected from 
S65G, S65T, S65A, S65L, S65C, S65V and S65I. 
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1 19. The recombinant host cell of claim 17 wherein the amino acid 

2 sequence differs by no more than the substitutions S65T/T203H; S65T/T203Y; 

3 S72A/F64L/S65G/T203Y; S72A/S65G/V68L/T203Y; S65G/V68L/Q69K/S72A/T203Y; 

4 S65G/S72A/T203Y; or S65G/S72A/T203W. 

1 20. - The recombinant host cell of claim 17 or 1 8 wherein the amine acid 

2 sequence further comprises a substitution at Y66, wherein the substitution is selected from 

3 Y66H, Y66F, and Y66W. 

1 21. The recombinant host cell of claim 1 7 or 1 8 wherein the amino acid 

2 sequence further comprises a mutation from Table A. 

1 22. The recombinant host cell of claim 1 7 or 1 8 wherein the amino acid 

2 sequence further comprises a folding mutation. 

1 23. The recombinant host cell of any of claims 17-19 wherein the 

2 nucleotide sequence encoding the protein differs from the nucleotide sequence of SEQ TJD 

3 NO: 1 by the substitution of at least one codon by a preferred mammalian codon. 

1 24. The recombinant host cell of any of claims 17-19 encoding a fusion 

2 protein wherein the fusion protein comprises a polypeptide of interest and the functional 

3 engineered fluorescent protein. 

1 25. The recombinant host cell of any of claims 1 7-19 which is a 

2 prokaryotic cell. 

1 26. The recombinant host cell of any of claims 17-19 which is a 

2 eukaryotic cell. 
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1 27. A functional engineered fluorescent protein whose amino acid 

2 sequence is substantially identical to the amino acid sequence of Aequorea green fluorescent 

3 protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least the amino acid 

4 substitution T203X, wherein X is an aromatic amino acid selected from H, Y, W or F, said 

5 functional engineered fluorescent protein having a different fluorescent property than 

6 Aequorea green fluorescent protein. 

1 28. The protein of claim 27 wherein the amino acid sequence further 

2 comprises a substitution at S65, wherein the substitution is selected from S65G, S65T, 

3 S65A, S65L, S65C, S65V and S65I. 

1 29. The protein of claim 27 wherein the amino acid sequence differs by 

2 no more than the substitutions S65T/T203H; S65T/T203Y; S72A/F64L/S65G/T203Y; 

3 S72A/S65G/V6SL/T203Y; S65G/V6SL/Q69K/S72A/T203Y; S65G/S72A/T203Y; or 

4 S65G/S72A/T203W. 

1 30. The protein of claim 27 or 28 wherein the amino acid sequence 

2 further comprises a substitution at Y66, wherein the substitution is selected from Y66H, 

3 Y66F, and Y66W. 

1 31. The protein of claim 27 or 28 wherein the amino acid sequence 

2 further comprises a folding mutation. 

1 32. The protein of any of claims 27-29 which is a fusion protein wherein 

2 the fusion protein comprises a polypeptide of interest and the functional engineered 

3 fluorescent protein. 
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1 33. A fiuorescently labelled antibody comprising an antibody coupled to 

2 a functional engineered fluorescent protein whose amino acid sequence is substantially 

3 identical to the amino acid sequence of Aequorea green fluorescent protein (SEQ ID NO:2) 

4 and which differs from SEQ ID NO:2 by at least the amino acid substitution T203X, 

5 wherein X is an aromatic amino acid selected from H, Y, W or F, said functional engineered 

6 fluorescent protein having a different fluorescent property than Aequorea green fluorescent 

7 protein. 

1 34. The fiuorescently labelled antibody of claim 33 wherein the amino 

2 acid sequence further comprises a substitution at S65, wherein the substitution is selected 

3 from S65G, S65T, S65A, S65L, S65C, S65V and S65I. 

1 35. The fiuorescently labelled antibody of claim 33 wherein the amino 

2 acid sequence differs by no more than the substitutions S65T/T203H; S65T/T203 Y; 

3 S72A/F64L/S65G/T203Y; S72A/S65G/V68L/T203Y; S65G/V68L/Q69K/S72AjT203Y; 

4 S65G/S72A/T203Y; or S65G/S72A/T203W. 

1 36. The fiuorescently labelled antibody of claim 33 or 34 wherein the 

2 amino acid sequence further comprises a substitution at Y66, wherein the substitution is 

3 selected from Y66H, Y66F, and Y66W. 

1 37. The fiuorescently labelled antibody of any of claims 33-35 which is a 

2 fusion protein wherein the fusion protein comprises the antibody fused to the functional 

3 engineered fluorescent protein. 
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1 38. A nucleic acid molecule comprising a nucleotide sequence encoding 

2 an antibody fused to a nucleotide sequence encoding a functional engineered fluorescent 

3 protein whose amino acid sequence is substantially identical to the amino acid sequence of 

4 Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 

5 by at least the amino acid substitution T203X, wherein X is an aromatic amino acid selected 

6 from H, Y, W or F, said functional engineered fluorescent protein having a different 
1 fluorescent property than Aequorea green fluorescent protein. 

1 39. The nucleic acid molecule of claim 38 wherein the amino acid 

2 sequence further comprises a substitution at S65, wherein the substitution is selected from 

3 S65G, S65T, S65A, S65L, S65C, S65V and S65I. 

1 40. The nucleic acid molecule of claim 38 wherein the amino acid 

2 sequence differs by no more than the substitutions S65T/T203H; S65T/T203 Y; 

3 S72A/F64L/S65G/T203Y; S 72A/S65G/V68L/T203 Y; S65G/V68L/Q69K/S72A/T203Y; 

4 S65G/S72A/T203Y; or S65G/S72A/T203W. 

1 41 . The nucleic acid molecule of claim 38 or 39 wherein the amino acid 

2 sequence further comprises a substitution at Y66, wherein the substitution is selected from 

3 Y66H, Y66F, and Y66W. 

1 42. A fluorescently labelled nucleic acid probe comprising a nucleic acid 

2 probe coupled to a functional engineered fluorescent protein whose amino acid sequence is 

3 substantially identical to the amino acid sequence of Aequorea green fluorescent protein 

4 (SEQ ID NO:2) and which differs from SEQ ED NO:2 by at least the amino acid substitution 

5 T203X, wherein X is an aromatic amino acid selected from H, Y, W or F, said functional 

6 engineered fluorescent protein having a different fluorescent property than Aequorea green 

7 fluorescent protein. 
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1 43. The fluorescently labelled nucleic acid probe of claim 42 wherein the 

2 amino acid sequence further comprises a substitution at S65, wherein the substitution is 

3 selected from S65G, S65T, S65A, S65L, S65C, S65V and S65I. 

1 44. The fluorescently labelled nucleic acid probe of claim 42 wherein the 

2 amino acid sequence differs by no more than the substitutions S65T/T203H; S65T/T203Y; 

3 S72A/F64L/S65G/T203Y; S72A/S65G/V68L/T203Y; S65G/V68L/Q69K/S72A/T203Y; 

4 S65G/S72A/T203Y; or S65G/S72A/T203W. 

1 45. The nucleic acid molecule of claim 42 or 43 wherein the amino acid 

2 sequence further comprises a substitution at Y66, wherein the substitution is selected from 

3 Y66H, Y66F, and Y66W. 

4 

1 46. A nucleic acid molecule comprising a nucleotide sequence encoding 

2 a functional engineered fluorescent protein whose amino acid sequence is substantially 

3 identical to the amino acid sequence of Aequorea green fluorescent protein (SEQ ID NO:2) 

4 and which differs from SEQ ID N0.2 by at least an amino acid substitution at L42, V61, 

5 T62, V68, Q69, Q94, N121, Y145, H148, V150, F165, 1167, Q133, N185, L220, E222 (not 

6 E222G), or V224, said functional engineered fluorescent protein having a different 

7 fluorescent property than Aequorea green fluorescent protein. 

1 47. The nucleic acid molecule of claim 46 wherein the amino acid 

2 substitution is: 

3 L42X, wherein X is selected from C, F, H, W and Y, 

4 V61X, wherein X is selected from F, Y, H and C, ' 

5 ' T62X, wherein X is selected from A, V, F, S, D, N, Q, Y, H and C, 

6 V68X, wherein X is selected from F, Y and H, 

7 Q69X, wherein X is selected from K, R, E and G, 

8 Q94X, wherein X is selected from D, E, H, K and N, 
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9 N 1 2 1 X, wherein X is selected from F, H, W and Y, 

I o Y145X, wherein X is selected from W, C, F, L, E, H, K and Q, 

I I HI 48X, wherein X is selected from F, Y, N, K, Q and R, 

12 VL50X, wherein X is selected from F, Y and H, 

13 F165X, wherein X is selected from H, Q r W and Y, 

14 II 67X, wherein X is selected from F, Y and H, 

1 5 Q183X, wherein X is selected from H, Y, E and K, 

16 N185X, wherein X is selected from D, E, H, K and Q, 

1 7 L220X, wherein X is selected from H, N, Q and T, 

1 8 E222X, wherein X is selected from N and Q or 

1 9 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 
20 

1 48. An expression vector comprising expression control sequences 

2 operatively linked to a nucleic acid molecule of comprising a nucleotide sequence encoding 

3 a functional engineered fluorescent protein whose amino acid sequence is substantially 

4 identical to the amino acid sequence of Aequorea green fluorescent protein (SEQ ED NO:2) 

5 and which differs from SEQ ED NO:2 by at least an amino acid substitution at IA2, V61, 

6 162, V68, Q69, Q94, N121, Y145, H14S, V150, F165, 1167, Q183.N185, L220, E222 (not 

7 E222G), or V224, said functional engineered fluorescent protein having a different ^ 

8 fluorescent property than Aequorea green fluorescent protein. 

1 49. The expression vector of claim 48 wherein the amino acid 

2 substitution is: 

3 L42X, wherein X is selected from C, F, H, W and Y, 

4 V61X, wherein X is selected from F, Y, H and C, 

5 T62X, wherein X is selected from A, V, F, S, D, N, Q, Y, H and C, 

6 V68X, wherein X is selected from F, Y and H, 

7 Q69X, wherein X is selected from K, R, E and G, 

8 Q94X wherein X is selected from D, E, H, K and N, 
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9 N121X, wherein X is selected from F, H, W and Y, 

10 Y145X, wherein X is selected from W, C, F, L, E, H, K and Q, 

1 1 H148X, wherein X is selected from F, Y, N, K, Q and R, 

12 V150X, wherein X is selected from F, Y and H, 

1 3 Fl 65X, wherein X is selected from H, Q, W and Y, 

14 I167X, wherein X is selected from F, Y and H, 

15 Q183X, wherein X is selected from H, Y, E and K, 

1 6 Nl 85X, wherein X is selected from D, E, H, K and Q, 

1 7 L220X, wherein X is selected from H, N, Q and T, 

1 8 E222X, wherein X is selected from N and Q or 

1 9 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 

1 50. A recombinant host cell comprising an expression vector that 

2 comprises expression control sequences operatively linked to a nucleic acid molecule 

3 comprising a nucleotide sequence encoding a functional engineered fluorescent protein 

4 whose amino acid sequence is substantially identical to the amino acid sequence of 

5 Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 

6 by at least an amino acid substitution at L42, V61, 162, V68, Q69, Q94, N121, Y145, 

7 H148, V150, F165, 1167, Q183, N185, L220, E222 (not E222G), or V224, said functionai 

8 engineered fluorescent protein having a different fluorescent property than Aequorea green 

9 fluorescent protein. 

1 51. The recombinant host cell of claim 50 wherein the amino acid 

2 substitution is: 

3 L42X, wherein X is selected from C, F, H, W and Y, 

4 V6 IX, wherein X is selected from F, Y, H and C, 

5 T62X, wherein X is selected from A, V, F, S, D, N, Q, Y, H and C, 

6 V68X, wherein X is selected from F, Y and H, 

7 Q69X, wherein X is selected from K, R, E and G, 
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8 Q94X, wherein X is selected from D, E, H, K and N, 

9 N121X, wherein X is selected from F, H, W and Y, 

1 0 YI 45X, wherein X is selected from W, C, F, L, E, H, K and Q, 

i 1 H148X, wherein X is selected from F, Y, N, K, Q and R, 

12 V150X, wherein X is selected from F, Y and H, 

13 F165X, wherein X is selected from H, Q, W and Y, 

14 I167X, wherein X is selected from F, Y and H, 

15 Q183X, wherein X is selected from H, Y, E and K, 

16 Nl 85X, wherein X is selected from D, E, H, K and Q, 

1 7 L220X, wherein X is selected from H, N, Q and T, 

1 8 E222X, wherein X is selected from N and Q or 

1 9 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 
20 

1 52. A functional engineered fluorescent protein whose amino acid 

2 sequence is substantially identical to the amino acid sequence oiAequorea green fluorescent 

3 protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least an amino acid 

4 substitution at L42, V61, T62, V68, Q69, Q94, N121, Y145, H148, V150, F165, 1167, 

5 Q183, N185, L220, E222 (E222G), or V224, said functional engineered fluorescent protein 

6 having a different fluorescent property than Aequorea green fluorescent protein. # 

1 53. The functional engineered fluorescent protein of claim 52 wherein the 

2 amino acid substitution is: 

3 L42X, wherein X is selected from C, F, H, W and Y, 

4 V61X, wherein X is selected from F, Y, H and C, ■ 

5 • T62X, wherein X is selected from A, V, F, S, D, N, Q, Y, H and C, 

6 V68X, wherein X is selected from F, Y and H, 

7 Q69X, wherein X is selected from K, R, E and G, 

8 Q94X, wherein X is selected from D, E, H, K and N, 
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9 N 1 2 1 X, wherein X is selected from F, H, W and Y, 

10 Y 145X, wherein X is selected from W, C, F, L, E, H, K and Q, 

1 1 H148X, wherein X is selected from F, Y, N, K t Q and R, 

12 V i 5 OX, wherein X is selected from F, Y and H, 

13 F165X, wherein X is selected from H, Q, W and Y, 

1 4 II 67X, wherein X is selected from F, Y and H, 

1 5 Q183X, wherein X is selected from H, Y, E and K, 

1 6 N185X, wherein X is selected from D, E, H, K and Q, 

1 7 L220X, wherein X is selected from H, N, Q and T, 

1 8 E222X, wherein X is selected from N and Q or 

19 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 

1 54. A fluorescently labelled antibody comprising an antibody coupled to 

2 a functional engineered fluorescent protein whose amino acid sequence is substantially 

3 identical to the amino acid sequence oiAequorea green fluorescent protein (SEQ ID NO:2) 

4 and which differs from SEQ ID NO:2 by at least an amino acid substitution at L42, V61, 

5 T62, V68, Q69, Q94, N121, Y145, H148, V150, F165, 1167, Q183, N185, L220, E222 (not 

6 E222G), or V224, said functional engineered fluorescent protein having a different 

7 fluorescent property than Aequorea green fluorescent protein. • 

1 55. The antibody of claim 54 wherein the amino acid substitution is: 

2 L42X, wherein X is selected from C, F, H, W and Y, 

3 V61X, wherein X is selected from F, Y, H and C, 

4 T62X, wherein X is selected from A, V, F, S, D, N; Q, Y, H and C, 

5 V68X, wherein X is selected from F, Y and H, 

6 Q69X, wherein X is selected from K, R, E and G, 

7 Q94X, wherein X is selected from D, E, H, K and N, 

8 N121 X, wherein X is selected from F, H, W and Y, 
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9 Y145X, wherein X is selected from W, C, F, L, E, H, K and Q, 

10 H148X, wherein X is selected from F, Y, N, K, Q and R, 

11 V 1 SOX, wherein X is selected from F, Y and H, 

12 F165X, wherein X is selected from H, Q, W and Y, 

1 3 I167X, wherein X is selected from F, Y and H, 

14 Q183X, wherein X is selected from H, Y, E and K, 

15 N185X, wherein X is selected from D, E, H, K and Q, 

1 6 L220X, wherein X is selected from H, N, Q and T, 

1 7 E222X, wherein X is selected from N and Q or 

1 8 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 

1 56. A nucleic acid molecule comprising a nucleotide sequence encoding 

2 an antibody fused to a nucleotide sequence encoding a functional engineered fluorescent 

3 protein whose amino acid sequence is substantially identical to the amino acid sequence of 

4 Aequorea green fluorescent protein {SEQ ID NO:2) and which differs from SEQ ID NO:2 

5 by at least an amino acid substitution at L42, V61, T62, V68, Q69, Q94, N121, Y145, 

6 H148, V150, FI65, 1167, Q183, N185, L220, E222 (not E222G), cr V224, said functional 

7 engineered fluorescent protein having a different fluorescent property than Aequorea green 

8 fluorescent protein. 

1 57. The nucleic acid molecule of claim 56 wherein the amino acid 

2 substitution is: 

3 L42X, wherein X is selected from C, F, H, W and Y, 

4 V61X, wherein X is selected from F, Y, H and C, . 

5 . T62X, wherein X is selected from A, V, F, S, D, N, Q, Y, H and C, 

6 V68X, wherein X is selected from F, Y and H, 

7 Q69X, wherein X is selected from K, R, E and G, 

8 Q94X, wherein X is selected from D, E, H, K and N, 

9 N12 IX, wherein X is selected from F, H, W and Y, 
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10 Y145X, wherein X is selected from W, C, F, L, E, H, K and Q, 

11 HI 48X, wherein X is selected from F, Y, N, K, Q and R, 

12 VI 50X, wherein X is selected from F, Y and H, 

1 3 Fl 65X, wherein X is selected from H, Q, W and Y, 

14 II 67X, wherein X is selected from F, Y and H, 

15 Q 1 83X, wherein X is selected from H, Y, E and K, 

16 N185X, wherein X is selected from D, E, H, K and Q, 

1 7 L220X, wherein X is selected from H, N, Q and T, 

18 E222X, wherein X is selected from N and Q or 

1 9 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 

1 58. A fluorescently labelled nucleic acid probe comprising a nucleic acid 

2 probe coupled to a functional engineered fluorescent protein whose amino acid sequence is 

3 substantially identical to the amino acid sequence of Aequorea green fluorescent protein 

4 (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least an amino acid substitution 

5 at L42, V61, T62, V68, Q69, Q94, N121, Y145, H148, V150, F165, 1167, Q183, N185, 

6 L220, E222 (E222G), or V224, said functional engineered fluorescent protein having a 

7 different fluorescent property than Aequorea green fluorescent protein. 

1 59. The probe of claim 58 wherein the amino acid substitution is: 

2 L42X, wherein X is selected from C, F, H, W and Y, 

3 V6 1 X, wherein X is selected from F, Y, H and C, 

4 T62X, wherein X is selected from A, V,F,S,D, N.Q.Y, Hand C, 

5 V68X, wherein X is selected from F, Y and H, 

6 Q69X, wherein X is selected from K, R, E and G, 

7 Q94X, wherein X is selected from D, E, H, K and N, 

8 Nl 2 1 X, wherein X is selected from F, H, W and Y, 

9 Yl 45X, wherein X is selected from W, C, F, L, E, H, K and Q, 
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10 H148X, wherein X is selected from F, Y, N, K, Q and R, 

11 VI 50X, wherein X is selected from F, Y and H, 

12 F165X, wherein X is selected from H, Q, W and Y, 

1 3 I167X, wherein X is selected from F, Y and H, 

14 Q183X, wherein X is selected from H, Y, E and K, 

15 N185X, wherein X is selected from D, E, H, K and Q, 

1 6 L220X, wherein X is selected from H, N, Q and T, 

17 E222X, wherein X is selected from N and Q or 

1 8 V224X, wherein X is selected from H, N, Q, T, F, W and Y. 

1 60. A method for determining whether a mixture contains a target 

2 comprising: 

3 contacting the mixture with a fluorescently labelled probe comprising 

4 a probe and a functional engineered fluorescent protein of claim 27 or claim 52; and 

5 determining whether the target has bound to the probe. 

1 61. The method of any of claim 60 the target is bound to a solid matrix. 

1 

2 62. A method for engineering a functional engineered fluorescent protein 

3 having a fluorescent property different than Aequorea green fluorescent protein, comprising 

4 substituting an amino acid that is located no more than 0.5 nm from any atom in the 

5 chromophore of an Aequorea-related green fluorescent protein with another amino acid; 

6 whereby the substitution alters a fluorescent property of the protein. 



1 
2 
3 



63. The method of claim 62 wherein the amino acid substitution alters the 
electronic environment of the chromophore. 
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1 64. A method for engineering a functional engineered fluorescent protein 

2 having a different fluorescent property than Aequorea green fluorescent protein comprising 

3 substituting amino acids in a loop domain of an Aequorea-rclxted green fluorescent protein 

4 with amino acids so as to create a consensus sequence for phosphorylation or for 

5 proteolysis. 

1 65. A method for producing fluorescence resonance energy transfer 

2 comprising: 

3 providing a donor molecule comprising a functional engineered 

4 fluorescent protein of claim 27 or claim 52; 

5 providing an appropriate acceptor molecule for the fluorescent 

6 protein; and 

7 bringing the donor molecule and the acceptor molecule into 

8 sufficiently close contact to allow fluorescence resonance energy transfer. 

1 66. A method for producing fluorescence resonance energy transfer 

2 comprising: 

3 providing an acceptor molecule comprising a functional engineered 

4 fluorescent protein of claim 27 or claim 52; 

5 providing an appropriate donor molecule for the fluorescent protein; 

6 and 

7 bringing the donor molecule and the acceptor molecule into 

8 sufficiently close contact to allow fluorescence resonance energy transfer. 

1 67. The method of claim 66 wherein the donor molecule is a engineered 

2 fluorescent protein whose amino acid sequence comprises the substitution T203I and the 

3 acceptor molecule is a nutant fluorescent protein whose amino acid sequence comprises the 

4 substitution T203X, wherein X is an aromatic amino acid selected from H, Y, W or F, said 

5 functional engineered fluorescent protein having a different fluorescent property than 

6 Aequorea green fluorescent protein. 
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1 68. A nucleic acid molecule comprising a nucleotide sequence encoding 

2 a functional engineered fluorescent protein whose amino acid sequence is substantially 

3 identical to the amino acid sequence of Aequorea green fluorescent protein (SEQ ID N0:2) 

4 and which differs from SEQ ID N0:2 by at least an amino acid substitution located no more 

5 than about 0.5 nm from the chrotnophore of the engineered fluorescent protein, wherein the 

6 substitution alters the electronic environment of the chromophore, whereby the functional 

7 engineered fluorescent protein has a different fluorescent property than Aequorea green 

8 fluorescent protein. 

1 69. An expression vector comprising expression control sequences 

2 operatively linked to a nucleotide sequence encoding a functional engineered fluorescent 

3 protein whose amino acid sequence is substantially identical to the amino acid sequence of 

4 Aequorea green fluorescent protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 

5 by at least an amino acid substitution located no more than about 0.5 nm from the 

6 chromophore of the engineered fluorescent protein, wherein the substitution alters the 

7 electronic environment of the chromophore, whereby the functional engineered fluorescent 

8 protein has a different fluorescent property than Aequorea green fluorescent protein. 

1 70. A functional engineered fluorescent protein whose amino acid 

2 sequence is substantially identical to the amino acid sequence of Aequorea green fluorescent 

3 protein (SEQ ID NO:2) and which differs from SEQ ID NO:2 by at least an amino acid 

4 substitution located no more than about 0.5 nm from the chromophore of the engineered 

5 fluorescent protein, wherein the substitution alters the electronic environment of the 

6 chromophore, whereby the functional engineered fluorescent protein has a different 

7 fluorescent property than Aequorea green fluorescent protein. 

1 7 1 . A crystal of a protein comprising a fluorescent protein with an amino 

2 acid sequence substantially identical to SEQ ID NO: 2, wherein said crystal diffracts with at 

3 least a 2.0 to 3.0 angstrom resolution. 
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1 72. The crystal of claim 7 1 , wherein the fluorescent protein has at least 

2 200 amino acids, a completeness value of at least 80% and has a crystal stability within 

3 0.5% of its unit cell dimensions. 

1 73. The crystal of claim 71, wherein the amino acid sequence comprises a 

2 substitution at S65, wherein the substitution is selected from S65G, S65T, S65A, S65L, 

3 S65C, S65V and S65I. 

1 74. The crystal of claim 7 1 , wherein said crystal has the following unit 

2 cell dimensions in angstroms: a = 5 1 .8, b= 62.8 and c= 70.7 with a space group of P2 2 2 

3 and an O angle of 90.00O, a □ angle of 90.000 and a □ angle of 90.000 and the crystal has 

4 a diffraction limit where 90% or greater of the potential reflections can be used to determine 

5 the coordinates of the atoms. 

1 75. A computational method of designing a fluoresent protein 

2 comprising: 

3 determining from a three dimensional model of a crystallized 

4 fluorescent protein comprising a fluorescent protein with a bound ligand, at least one 

5 interacting amino acid of the fluorescent protein that interacts with at least one first 

6 chemical moiety of the ligand, and 

7 selecting at least one chemical modification of the first chemical 

8 moiety to produce a second chemical moiety with a structure to either decrease or increase 

9 an interaction between the interacting amino acid and the second chemical moiety compared 
10 to the interaction between the interacting amino acid and the first chemical moiety. 

1 76. The computational method of claim 75 , further comprising generating 

2 the three dimensional model of the crystallized protein comprising a fluorescent protein 

3 with an amino acid sequence substantially identical to SEQ ID NO:2. 
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77. The computational method of claim 75, wherein the selecting selects 
the first chemical moiety that interacts with at least one of the amino acids listed in Figs. 5-1 
to 5-28. 

78. The computational method of claim 75, wherein the chemical 
modification enhances hydrogen bonding interaction, charge interaction, hydrophobic 
interaction, Van Der Waals interaction or dipole interaction between the second chemical 
moiety and the interacting amino acid compared to the first chemical moiery and the 
interacting amino acid. 

79. A computational method of modeling the three dimensional structure 
of a fluorescent protein comprising determining a three dimensional relationship between at 
least two atoms listed in the atomic coordinates of Figs. 5-1 to 5-28. 

80. The computational method of claim 79, wherein the determining 
comprises determining the three dimensional structure of a fluorescent protein with an 
amino acid sequence at least 80% identical to SEQ ID NO:2. 

8 1 . The computational method of claim 79, wherein the deterniining 
comprises determining the three dimensional structure of a fluorescent protein with an* 
amino acid sequence at least 95% identical to SEQ ED NO:2. 

82. The computational method of claim 79, wherein the deterniining 
comprises determining the three dimensional relationship of at least 1500 atoms listed in 
Figs. 5-1 to 5-28. 

83. A device comprising a storage device and, stored in the device, at 
least 10 atomic coordinates selected from the atomic coordinates listed in Figs. 5-1 to 5-28. 
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1 84. The device of claim 83, wherein the storage device is a computer 

2 readable device that stores code that receives as input the atomic coordinates. 
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1 85. The device of claim 84, wherein computer readable device is a floppy 

2 disk or a hard drive. 

3 S6. A nucleic acid molecule comprising a nucleotide sequence encoding a functional 

4 engineered fluorescent protein whose amino acid sequence is substantially identical to 

5 the amino acid sequence of Aequorea green fluorescent protein (SEQ ED NO:2) and 

6 which differs from SEQ ID NO:2 by at least a substitution at Q69, wherein said 

7 functional engineered fluorescent protein has a different fluorescent property than 

8 Aequorea green fluorescent protein. 

S 87. The nucleic acid molecule of claim 86, wherein said substitution at Q69 is selected 

1 0 from the group of K, R, E and G. 

1 1 88. The nucleic acid molecule of claim 86, wherein said amino acid sequence further 

12 comprises a function mutation at S65. 

13 89. A nucleic acid molecule comprising a nucleotide sequence encoding a functional 

1 4 engineered fluorescent protein whose amino acid sequence is substantially identical to 

15 the amino acid sequence of Aequorea green fluorescent protein (SEQ ID NO:2) and 

1 6 which differs from SEQ ED NO:2 by at least a substitution at E222, but not including 

17 E222G, wherein said functional engineered fluorescent protein has a different 

1 8 fluorescent property than Aequorea green fluorescent protein. 

1 9 90. The nucleic acid molecule of claim 89, wherein said substitution at E222 is selected 
2 0 from the group o f N zrA Q. 

21 91. The nucleic acid molecule of claim 89, wherein said amino acid sequence further 
2 2 comprises a function mutation at F64. 

23 92. A nucleic acid molecule comprising a nucleotide sequence encoding a functional 
2 4 engineered fluorescent protein whose amino acid sequence is substantially identical to 

25 the amino acid sequence of Aequorea green fluorescent protein (SEQ ED NO:2) and 

2 6 which differs from SEQ ID NO:2 by at least a substitution at Y145, wherein said 

2 7 functional engineered fluorescent protein has a different fluorescent property than 

2 8 Aequorea green fluorescent protein. 

2 9 93. The nucleic acid molecule of claim 92, wherein said substitution at Y145 is selected 

3 0 from the group of W, C, F, L, E, H, K and Q . 
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3 1 94. The nucleic acid molecule of claim 92, wherein said amino acid sequence further 

32 comprises a function mutation at Y66. 

33 95. A method of identifying a test chemical, comprising: 

34 contacting a test chemical a sample containing a biological entity labeled with a 

35 functional, engineered fluorescent protein or a polynucleotide encoding said functional, 

3 6 engineered fluorescent protein, and 

37 detecting fluorescence of said functional engineered fluorescent protein. 

38 96. The method of claim 95, wherein said fluorescence in the presence of a test 

39 chemical is greater than in the absence of said test chemical. 

40 97. The method of claim 96, wherein said polynucleotide encoding said functional, 

4 1 engineered fluorescent protein is operatively linked to a genomic polynucleotide. 

42 98. The method of claim 95, wherein said functional, engineered fluorescent protein is 

4 3 fused to second functional protein. 

4 4 99. The method of claim 96, wherein said polynucleotide encoding said functional, 
4 5 engineered fluorescent protein is operatively linked to a response element. 

4 6 100. The method of claim 96, wherein said polynucleotide encoding said functional, 
4 7 engineered fluorescent protein is operatively linked to a response element in a 

4 8 mammalian cell. 
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Figure lb 



WO 98/06737 



3/36 



PCT/US97/14593 




Figure 2a 
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Figure 2b 
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36.617 
34.629 
33.971 
34.479 
34.456 
32 .467 
31.734 
32 .015 
34.931 
35.433 
34.516 
34. 179 
36.839 
37.422 
37.728 
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32.745 
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34.340 
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34.691 
34.628 
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35 . 344 
34.447 
33.105 
32.950 
32. 127 
34.805 
35.401 
35.0S7 
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5.005 
5 . 579 



1.00 30.53 
1.00 23.21 
1.00 16.55 
1.00 25.70 
1.00 16.39 
1.00 22.28 
1.00 29.60 
1.00 20.43 
1.00 30.87 
1.00 31.75 
1.00 33.85 
1.00 20.12 
1.00 12.88 
1.00 14.37 
1.00 13.42 
1.00 15.01 
1.00 17.57 
1.00 16.55 
1.00 14.72 
1.00 10.76 
1.00 7.65 
1.00 15.14 
1.00 17.36 
1.00 19.69 
1.00 15.41 
1.00 14.91 
1.00 12.93 
1.00 12.08 
1.00 11.04 
1.00 16.54 
1.00 IS. 08 
1.00 11.56 
1.00 16.15 
1.00 13.85 
1.00 14.82 
1.00 3.62 
1.00 10.00 
1.00 21.25 
1.00 40.50 
1.00 46.97 
1.00 49.22 
1.00 10.56 
1.00 10.23 
1.00 9.47 
1.00 16.72 
1.00 12.85 
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1.00 14.24 
1.00 14.43 
1.00 13.61 
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CG ASP 
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GLY 
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144 OD1 ASP 
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135 



138 



140 
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11. 118 
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12.801 
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42.338 
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32 . 579 
32.931 
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1.00 15.13 
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1.00 27.40 
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1.00 19.66 
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1.00 12.26 
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206 
207 
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211 
212 
213 
214 
21S 
216 
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218 
219 
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221 
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223 
224 



SER 
SER 
SER 
SER 
SER 
VAX. 
VAL 



C VAL, 

O VAX. 

CB VAL 

CGI VAL 

CG2 VAL 

N SER 

CA SER 

C SER 

O SER 

CB SER 

OG SER 

N GLY 

CA GLY 

C GLY 

O GLY 

:i GLU 
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CB GLU 

CG GLU 

CD GLU 

OE1 GLU 
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12.579 
12.699 
10.143 
9.510 
13.335 
14.361 
14.258 
14.058 
15.768 
16.826 
15.989 
14.462 
14.535 
15.917 
16.398 
13.471 
12.249 
16.480 
17.718 
17.737 
17.149 
18.459 
18.622 
20.079 
20.734 
17.761 
16.264 
IS. 501 
IS. 996 
14.292 
20.534 
21.860 
22.236 
21.390 
23.525 
23.971 
25.220 
2S.926 
24. 180 
24.948 
24.879 
25.861 
23.653 
25.461 
26.611 
27.293 
26.650 
28.594 
29.367 
30.396 
31.469 
30.032 
30.681 
31.236 
30.587 
30.015 
30.818 
32. 181 
33 .084 
30.070 
32.307 
23.581 
34.705 



32 . 479 
21. 749 
31.933 
31.702 
31.678 
30.902 
30.093 
28.6.14 
28.266 
30. 570 
29.559 
32.001 
27.781 
26.351 
25.818 
26.157 
2 5.603 
25.667 
24.926 
24.321 
22.816 
22.324 
22. 112 
20.670 
20.297 
20.9 46 
19.893 
20.187 
19. 547 
13. 767 
20.022 
19.207 
IS. 687 
17.602 
15.919 
17.453 



17.760 
15. 114 
15.261 
14.020 
12.352 
13. 719 
16.222 
16.502 
15.192 
14. 161 
15.238 
14.061 
14. 505 
15.004 
13.457 
12.066 
11.519 
11.515 
14.402 
14. 846 



11.186 
30.379 
29.167 
31.086 
32.3S3 
21.073 
30.435 
30.817 
21.987 
30.839 
30.234 
30.3S7 
29.324 
30.011 
29.571 
28.513 
29.202 
29.882 
30.364 
29.977 
30.249 
21.176 
29.433 
29.570 
29.262 
28.456 
28.543 
28.618 
27.468 
26.698 
27.337 
29.822 
29.518 
30.467 
31.011 
30.702 
31.621 
22.367 
21.944 
30.927 
29.624 
28.796 
28.534 
28.430 
33.485 
34.315 
34.662 
34.750 
34.860 
35.221 
26.233 
25.879 
33.948 
24.075 
33,141 
25.248 
27.490 
23.582 
2a. 637 
39.331 
9.916 
7.945 



1.00 15.59 
1.00 15.96 
1.00 18.99 
1.00 14.48 
1.00 31.95 
1.00 16.73 
1.00 14.06 
1.00 6.80 
1.00 10.65 
1.00 17.96 
1.00 15.30 
1.00 16.37 
1.00 11.31 
1.00 17.96 
1.00 11.26 
1.00 13.17 
1.00 19.91 
1.00 48.74 
1.00 9.88 
1.00 12.44 
1.00 13.16 
1.00 12.41 
1.00 13.44 
1.00 13.73 
1.00 17.33 
1.00 15.56 
1.00 12.67 
1 .00 26.43 
1.00 21.13 
1.00 23.45 
1.00 30.63 
1.00 15.36 
1.00 12.84 
1.00 14.69 
1.00 13.56 
1.00 15.15 
1.00 18.14 
1.00 16.26 
1.00 18.67 
1.00 22.53 
1.00 33.78 
1.00 55.15 
1.00 45.29 
1.00 56.26 
1.00 11.20 
1.00 10.62 
1.00 19.92 
1.00 16.69 
1.00 16.92 
1.00 16.19 
1.00 13.94 
1.00 15.77 
1.00 19.98 
1.00 31.92 
1.00 30.97 
1.00 25.22 
1.00 13.40 
1.00 12.98 
1.00 21.94 
l.OO 12.61 
1.00 11.49 
1.00 15.63 
1.00 19.9 4 
1.00 25.61 
1.00 17.29 
1.00 22.57 
1.00 29.86 
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38 


742 


17 


306 


44 


694 


1 .00 


15 


41 


ATOM 


557 


M 


ASP 


76 


39 


433 


12 


756 


46 


038 


1.00 


18 


63 


ATOM 


558 


CA 


ASP 


76 


39 


269 


11 


770 


47 


062 


1.00 


16 


19 


ATOM 


559 


C 


ASP 


76 


39 


581 


12 


280 


48 


431 


1 .00 


15 


92 


ATOM 


560 


O 


ASP 


76 


38 


862 


12 


042 


49 


389 


1 .00 


17 


35 


ATOM 


561 


CB 


ASP 


76 


40 


083 


10 


507 


46 


790 


1.00 


18 


69 


ATOM 


562 


CG 


ASP 


76 


39 


826 


9 


432 


47 


825 


1.00 


24 


04 


ATOM 


563 


OD1 


ASP 


76 


40 


523 


9 


268 


48 


817 


1 .00 


29 


72 


ATOM 


564 


OD2 


ASP 


76 


38 


732 


3 


743 


47 


584 


1 .00 


40 


96 


ATOM 


555 


M 


HIS 


77 


40 


647 


12 


984 


43 


£61 


1.00 


18 


79 


ATOM 


566 


CA 


HIS 


77 


40 


978 


13 


418 


49 


877 


1.00 


19 


35 


ATOM 


567 


C 


HIS 


77 


40 


117 


14 


507 


50 


357 


1.00 


24 


£7 


ATOM 


= 68 


o 


HIS 


77 


40 


205 


14 


826 


51 


551 


1 .00 


27 


IS 


ATOM 


569 


C3 


HIS 


77 


42 


435 


13 


806 


50 


042 


1.00 


19 


84 


ATOM 


570 


CG 


HIS 


77 


42 


743 


15 


035 


49 


322 


1.00 


17 


31 


ATOM 


571 


MD1 


HIS 


77 


42 


925 




028 


47 


953 


1.00 


21 


86 


ATOM 


572 


CD2 


HIS 


77 


42 


925 


15 


295 


49 


774 


1.00 


18 


70 


ATOM 


573 


CEl 


KIS 


77 


43 


203 


16 


289 


47 


593 


1.00 


17 


49 


ATOM 


S74 


ME2 


HIS 


77 


43 


213 


17 


069 


43 


668 


1.00 


18 


11 


ATOM 


575 




MSE 


78 


39 


277 


15 


069 


49 


565 


1.00 


2S 


36 


ATOM 


S76 


CA 


MSE 


78 


38 


412 


16 


140 


50 


026 


1 .00 


24 


65 


ATOM 


577 


C 


MSE 


78 


36 


920 


15 


774 


50 


066 


1.00 


26 


47 


ATOM 


578 


° 


MSE 


78 


36 


070 


16 


636 


50 


260 


1.00 


28 


16 




















49 


121 


1 .00 


26 


38 


ATOM 


580 


CG 


MSE 


78 


39 


803 


18 


177 


49 


406 


1.00 


27 


01 


ATOM 


581 


SE 


MSE 


78 


39 


987 


19 


608 


43 


117 


1 .00 


43 


09 


ATOM 


582 


CZ 


MSE 


78 


38 


874 


20 


873 


49 


044 


: .oo 


27 


11 


ATOM 


583 


u 


LYS 


79 


36 


606 


14 


509 


49 


856 


1.00 


18 


68 


ATOM 


584 


CA 


LYS 


79 


35 


216 


14 


061 


49 


853 


1.00 


21 


54 


ATOM 


585 


c 


k LYS 


79 


34 


406 


14 


449 




082 


1.00 


20 


21 


ATOM 


586 


o 


LYS 


79 


33 


136 




652 




025 


1.00 


21 


08 


ATOM 


= 87 


CB 


LYS 


79 


35 


152 


12 


581 


49 


612 


1 .00 


23 


48 


ATOM 


588 


CG 


LYS 


79 


35 


859 


12 


225 


43 


317 


: .oo 


41 


09 


ATOM 


589 


CD 


LYS 


79 


35 


159 






47 


535 


1.00 


34 


65 


ATOM 


590 


CE 


LYS 


79 


35 


796 




881 




131 


1.00 


53 


46 


ATOM 


591 


::z 


LYS 


79 


35 


084 








080 


1.00 


49 




ATOM 


592 




ARG 


30 


35 


069 




S42 




213 


1.00 


19 


77 


ATOM 


393 


CA 


ARG 


30 


34 


365 




874 




434 


1.00 


20 




ATOM 


594 


c 


ARG 


SO 


33 


898 








481 


: .oo 


26 


42 


ATOM 


595 




ARG 


50 














1.00 


23 




ATOM 


596 


C3 


ARC 


30 








549 




TOO 


; .oo 


24 
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ATOM 597 CG ARG 30 36.204 15.620 55.034 1.00 29.71 

ATOM 598 CD ARG 30 36.964 15.344 56.335 1.00 61.20 

ATOM 599 NE ARG 30 36.S51 16.230 57.415 1.00 71.14 

ATOM 600 CZ ARG SO 37.398 16.882 58.192 1.00100.00 

ATOM 601 NH1 ARG SO 38.714 16.758 58.040 1.00100.00 

ATOM 602 MH2 ARG 20 36.917 17.679 59.155 1.00 99.06 

ATOM 603 N HIS SI 34.275 17.121 52.473 1.00 18.77 

ATOM 604 CA HIS SI 33.903 18.547 52.499 1.00 19.60 

ATOM 605 C HIS 31 32.841 18.883 51.486 1.00 18.62 

ATOM 606 O HIS 31 32.S57 20.043 51.295 1.00 17.76 

ATOM 607 CB HIS 31 35.129 19.472 52.283 1.00 20.39 

ATOM 608 CG HIS 81 36.221 19.224 53.305 1.00 28.02 

ATOM 609 ND1 HIS 81 -36.127 19.701 54.618 1.00 30.59 

ATOM 610 CD2 HIS ai 37.392 18.535 53.202 1.00 29.02 

ATOM 611 CE1 HIS 81 37.218 19.308 55.265 1.00 26.24 

ATOM 612 NE2 HIS 81 37.991 18.603 54.452 1.00 28.18 

ATOM 613 N ASP 82 32.298 17.843 50.841 1.00 12.20 

ATOM 614 CA ASP 32 31.358 18.011 49.7 69 1.00 13.24 

ATOM 615 C ASP 82 29.922 18.148 50.259 1.00 24.30 

ATOM 616 O ASP 82 29.175 17.195 50.243 1.00 16.55 

ATOM 617 CB ASP S2 31.480 16.917 48.730 1.00 12.23 

ATOM 618 CG ASP 32 30.642 17.209 47.518 1.00 9.92 

ATOM 619 OD1 ASP 32 29.870 18.134 47.459 1.00 20.31 

ATOM 620 OD2 ASP 32 30.938 16.466 46.507 1.00 11.12 

ATOM 621 N ?HE S3 29.566 19.353 50.705 1.00 23.66 

ATOM 622 CA ?HZ S3 28.220 19.634 51.201 1-00 20.23 

ATOM 623 C PHE S3 27.154 19.3 33 50.168 1.00 20.93 

ATOM 624 O PHE 83 26.116 13.733 50.503 1.00 15.97 

ATOM 62S CB PHE 33 28.077 21.106 £1.666 1.00 19.59 

ATOM 626 CG PHE 33 26.624 21.613 51.805 1.00 16.91 

ATOM 627 CD1 PHE S3 25.946 21.498 53.021 1.00 17.76 

ATOM 628 CD 2 PHE 33 25.968 22.236 50.734 1.00 18.88 

ATOM 629 CE1 PHE 83 24.635 21.960 53.156 1.00 24.13 

ATOM 630 CE2 PHE S3 24.650 22.690 50.840 1.00 19.24 

ATOM 631 CZ PHE 83 24.001 22.575 52.068 1.00 20.67 

ATOM 632 M PHE 54 27.432 19.784 48.921 1.00 14.06 

ATOM 633 CA PHE 34 26.515 19.693 47.8G9 1.00 12.96 

ATOM 634 C ?HE 34 25.893 13.332 47.602 1.00.24.96 

ATOM 635 O PHE 34 24.674 13.200 47.534 1.00 21.55 

ATOM 636 CB PHE 34 27.085 20.265 46.513 1.00 13.44 

ATOM 637 CG PHE 34 27.630 21.645 46.721 1.00 14.27 

ATOM 638 CD1 PHE 34 29.001 21.845 46.S90 1.00 15.17 

ATOM 639 CD2 PHE S4 26.781 22.753 46.752 1.00 13.48 

ATOM 640 CE1 PHE 34 29.520 23.129 47.073 1.00 14.63 

ATOM 641 CE2 PHE 34 27.276 24.041 46.969 1.00 16.34 

ATOM 642 CZ PHE 34 28.650 24.221 47.137 1.00 15.77 

ATOM 643 N LYS 35 26.738 17.330 47.482 1.00 14.07 

ATOM 644 CA LYS 3S 26.294 15.985 47.283 1.00 13.30 

ATOM 645 C LYS 35 25.657 15.371 48.547 1.00 13.43 

ATOM 646 O LYS 35 24.773 14.509 48.429 1.00 ie.46 

ATOM 647 CB LYS 35 27.434 15.089 46.757 1.00 17.38 

"ATOM 648 CG LYS 35 27.873 15.372 45.323 1.00 13.93 

ATOM 649 CD LYS 35 28.969 14.381 44.888 1.00 13.23 

ATOM 650 CE LYS 35 29.766 14.819 43.662 1.00 10.36 

ATOM 651 NZ LYS 35 30.319 16.185 43.773 1.00 12.92 

ATOM 652 M ' SER 36 26.119 15.795 49.752 1.00 11.03 

ATOM 653 CA SER 56 25.610 15.267 =0.998 1.00 12.09 

ATOM 654 C SER 36 24.156 15.639 51.240 1.00 21.58 

ATOM 6SS O SER 36 23.452 14.979 52.013 1.00 19.89 

ATOM 6S6 CB SER 26 26.448 15.661 52.208 1.00 16.45 

ATOM 557 OG SER =6 26.308 17.042 S2.495 1.00 22.05 

ATOM 658 :j ALA ;7 23.705 16.698 50.5S2 1.00 15.09 

ATOM 559 CA ALA -57 22.333 17. 138 =0.762 1.00 19.52 

ATOM 660 C ALA i7 21.337 16.399 49.870 1.00 13.60 

ATOM 661 O nLA 11 20.162 16.557 50.040 1.00 19.55 

ATOM 662 C3 .-.LA 57 22.204 13.647 50.632 1.00 19.23 

ATOM 563 :: MSE 53 21.235 15.536 43.976 1-00 14.05 
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ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 



665 

566 

667 

668 

669 : 

670 

671 

672 

673 

674 

675 

676 

677 

678 

679 

680 

681 

682 

683 

684 



689 
690 
691 
692 
693 
694 
69S 
696 
697 
698 
699 
700 
701 
702 
703 
7 04 
705 
706 
707 
708 
709 
710 
711 
712 



CA MSE 

C MSE 

O MSE 

CB MSE 

CG MSE 

3E MSE 

CE MSE 

N PRO 

CA PRO 

C PRO 

O PRO 

CB PRO 

CG PRO 

CD PRO 

N GLU 

CA GLU 

C GLU 

O GLU 

CB GLU 

CG GLU 

CD GLU 

OE1 GLU 

OE2 GLU 

M GLY 

CA GLY 

C GLY 

O GLY 

N TYR 

CA TYR 

C TYR 

O TYR 

CB TYR 

CG TYR 

CD1 TYR 

CD 2 TYR 

CE1 TYR 

CZ2 TYR 

CZ TYR 



N VAL 

CA VAL 

C VAL 

O VAL 

CB VAL 

CGI VAL 

CG2 VAL 

N GLN 

CA GLN 

C GLN 

O GLN 

C3 GLN 

CG GLN 

CD GLN 

OE1 GLN 

NE2 GLN 
M 'GLU 

CA GLU 

C GLU 

O GLU 

ca glu 

CG GLU 

CD GLU 

OE1 GLU 

OE2 GLU 

arc 



21.007 
20.496 
21. 109 
21.848 
22.263 
20.737 
21.31S 
19.363 
18.552 
17.572 
17.085 
17.733 
17.726 
18.844 
17.278 
16.348 
16.701 
15.833 
16.031 
15.782 
17.071 
18. 179 
16.875 
17.977 
18.394 
18.673 
18.769 
18.861 
19. 143 
18.575 
18.270 
20.678 
21.546 
21.620 
22.317 
22.404 
23.067 
23.155 
23.S44 
18.447 
13.025 
19.231 
20. 172 
17.073 
16.855 
15.716 
19.361 
20.480 
19.948 
19.153 
21.232 
22.361 
23-431 
23.805 
23.719 
20.396 
19.974 
21.149 
22.206 
19.277 
18.009 
17.657 



ARG 
ARG 



14.796 
13.448 
12.876 
14.593 
15.891 
16.894 
18.684 
12.930 
13.475 
14.611 
15.301 
12.294 
11.261 
11.642 
14.795 
15.838 
17.229 
18.042 
15.816 
14.403 
13.641 
14.151 
12.373 
17.509 
18.769 
19.911 
19.764 
21.086 
22.266 
23.478 
23.483 
22.488 
22.468 
23.576 
21.350 
23.561 
21. 300 
22.424 
22.393 
24. 504 
25.822 

26. 625 
26.480 
27.937 
25.764 
27. 345 
28.195 
29.583 
29.788 
27.727 
23.708 
27.999 
26.879 
23. 527 
20. 531 
31.899 
32.804 
32. 623 
32.427 
51.684 
32.016 
33 . 166 
30. 987 
23.838 
34.783 



48.035 
48.579 
49 .457 
46.791 
46. 131 
45. 394 
45 . 748 
48.084 
47.008 
47.385 
46.493 
46.494 
47-607 
48. 560 
48.695 
49.157 
48.645 
48.368 
50.682 
51. 228 
51.447 
51.342 
51. 749 
48.510 
47.906 
48.839 
50.055 
48.225 
48.994 
48.347 
47. 144 
49.278 
48.012 
47. 166 
47.683 
46.00= 
46.504 
45.683 
44.517 
49.189 
48.778 
48.625 
49.451 
49.791 
49.413 
49. 771 
47.521 
47.227 
46.998 
46.061 
45.934 
45. 469 
44.632 
44.946 
43.449 
47.820 
47 . 643 
47 .398 
47 . 985 
48.87e 
49.215 
50.622 
51.011 
51.423 
46.601 
46. 342 
46. 206 



1.00 15.32 
1.00 21.43 
1.00 23. C3 
1.00 15. =3 
1.00 10.56 
1 .00 31.39 
1 .00 28. £5 
1.00 14.73 
1.00 14.50 
1.00 12.10 
1.00 18.06 
1.00 17.00 
1.00 15.33 
1.00 17.16 
1.00 14.63 
1.00 20.63 
1.00 25.53 

l.oo : 



.00 22 



21 



1.00 37. 
1.00 83.49 
l.OO 54. ao 
1.00 64.63 
1.00 21.39 
1.00 17.77 
1.00 12.17 
1.00 15.31 
1.00 12.02 
1.00 1C.33 
1.00 9.87 
1.00 15.29 
1.00 15.40 
1.00 15.13 
1.00 14.75 
1.00 16.09 
1.00 6. SO 
1.00 15.12 
1.00 18.13 
1.00 13.37 
1.00 11.93 
1.00 14.74 
1.00 16 . CO 
1.00 15.15 
1.00 23.45 
1.00 26.05 
1.00 22.50 
1.00 12.73 
1.00 10.53 
1.00 12.23 
1.00 15.32 
i.00 -.95 
1.00 11.37 
1.00 12.04 
1.00 12.50 
1.00 7.98 
1.00 11.73 
1.00 13.4? 
1.00 13.42 
1.00 15.23 
1.00 13.. 32 
1.00 22.46 
1.00 45.93 
1. 00100.30 
1 .00 61.53 
1.00 15.51 
1.00 15.37 
1.00 15.54 
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ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
'ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 



739 
740 
741 
742 
743 



ARG 
ARC 
ARG 
ARG 



N'Hl ARG 
:;h2 ARG 
:: THR 
CA THR 
C THR 
0 THR 
CS THR 

744 OG1 THR 

745 CG2 THR 

746 N ILE 

747 CA ILE 

748 C ILE 

749 O ILE 

750 CB ILE 

751 CGI ILE 
75 2 CG2 ILE 
753 CD1 ILE 
7 54 :,' ?HE 

CA PHE 
C PHE 
O ?HE 
CB PHE 
CG PHE 
CD1 PHE 
CD 2 PHE 
CEl PHE 
CE2 PHE 
CZ PHE 



757 
758 
759 



763 
764 
765 
766 



770 
771 
772 
773 
774 
775 
776 
777 
778 
779 
780 
781 
782 
783 



PHE 
?HE 



PHE 
CD1 PHE 
CD 2 PHE 
CEl PHE 
CE2 PHE 
CZ PHE 
fl LYS 
CA LYS 
C LYS 



CG 



C3 ASP 
CG ASP 
OD1 ASP 
OD2 ASP 



100 
100 
100 
100 



102 

102 

102 

102 



1Q3 
103 
.03 



20.389 
22-582 
23.495 
24.615 
25.411 
25 .434 
24.684 
26.236 
22 .470 
22.368 
23.593 
24.686 
22.282 
21.225 
22.038 
23.396 
24.486 
24.533 
23.628 
24.385 
24.480 
25.457 
23.875 
25.613 
25.719 
26.514 
27.696 
26.401 
25.638 
25.863 
24.698 
25.175 
23.992 
24.235 
25.906 
26.679 
27.254 
26.599 
25,927 
25.537 
24.426 
26.317 
24.087 
25.965 
24.852 
28.603 
29.270 
28. 732 
28.658 
30.784 
31.518 
33.036 
33.797 
28.353 



26.292 
28.840 
30. 109 
31.206 
29. 886 
25.813 
24. 602 
23.608 

14.899 



33.247 
33.453 
22 .277 
21.493 
:i.709 
30.430 
37.068 
38.424 
38.688 
38.347 
39.442 
39.101 
40.804 
39.219 
39.526 
41.017 
41.566 
38.752 
37.236 
39.231 
26.431 
41.673 
43.098 
43.441 
43.164 
43.770 
43.624 
42.524 
44.585 
42.400 
44.469 
43.369 
44.085 
44.522 
45.eS5 



42.484 
41.230 
41.192 
40.567 
45.946 
47. 179 
48.349 
48.304 
47.069 
4S.252 
48.060 
49. 116 
49.403 
SO. 618 
30.356 
; I .061 
SI. 269 
51.629 



45 .806 
44.967 
44.929 
43.908 
43 .766 
42.693 
41.615 
42 .714 
46.344 
45.935 
45.084 
45.485 
47.066 
47.945 
46.445 
43.899 
42.977 
42.686 
42.075 
41.660 
41.890 
40-679 
40.73B 
43. 110 
42.896 
41. 699 
41.700 
44.084 
45.356 
46. 189 
45.743 
47.400 
46.946 
47.771 
40.704 
29.554 
2 3.872 
40.308 
38.226 
27.764 
28.325 
26.843 
27.975 
2 5.43 5 
37.014 
39.737 
40.085 
39.287 
28.072 
29.950 
40.551 
40.534 
41.332 
29.997 
39.368 
23.549 
27.586 
23.516 
29.296 
23.931 
40.464 
23.933 
22.233 



1.00 15.01 
1-00 16.19 
1.00 17.61 
1.00 9.06 
1.00 9.88 
1.00 20.03 
1.00 15.29 
1.00 11.03 
1.00 13.39 
1.00 13.12 
1.00 16.31 
1.00 19.25 
1.00 26.27 
1.00 31.43 
1.00 15.90 
1.00 16.23 
1.00 16.70 
1.00 21.10 
1.00 14.58 
1.00 13.47 
1.00 16.09 
1.00 13.30 
1.00 13.93 
1.00 14.86 
1-00 12.44 
1.00 20.27 
1.00 20.07 
1.00 15.96 
1.00 21.41 
1.00 24.98 
1.00 22.94 
1.00 32.06 
1.00 24.26 
1.00 28.19 
1.00 12.53 
1.00 



.00 . 



.81 



1.00 20.31 
1.00 5.94 
1.00 12.75 
1.00 16.31 
1.00 15.27 
1.00 13.50 
1.00 21.25 
1.00 21.06 
1.00 15.49 
1.00 17.93 
1.00 13.71 
1.00 17.18 
1.00 17.13 
1.00 18.01 
1.00 26.70 
1.00 41.58 
1.00 18.09 
1.00 23.08 
1.00 25.42 
1.00 23.34 
1.00 26.27 
1.00 57.01 
63.23 



.00 • 



. 66 



1.00 20.17 
1.00 15.70 
1.00 18.47 
1.00 17.72 
1.00 19.29 
1.00 22.93 
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ATOM 
ATOM 
ATOM 
ATOM 



300 
801 
802 
803 
S04 
905 
806 
807 
808 
809 
810 
811 
812 
813 
814 
815 
816 
817 
818 
819 
820 
821 
822 
823 
824 
825 
826 
827 
328 
829 
830 
831 
832 
833 
334 
835 
836 
837 
838 
839 
840 
841 
842 
843 
844 
845 
846 
847 
848 
849 
850 
851 
852 
853 
354 
855 

ass 

357 
858 
359 

aeo 

861 
362 



OD1 AS? 
OD2 AS? 
M GLY 



O GLY 

U ASN 

CA ASN 

C ASN 

O ASN 

CB ASN 

CG ASN 

0D1 ASN 

ND2 ASN 
TYR 



CA 



LYS 
LYS 
LYS 
LYS 
LYS 
THR 
THR 



CB 



ARG 
ARG 
ARG 
ARG 
ARG 
NH1 ARG 
tlH2 ARG 
fl ALA 
CA ALA 
C ALA 
O ALA 
CB * ALA 
GLU 
GLU 
GLU 
GLU 



CA 



105 
105 
105 
105 
105 
105 
106 
106 
106 



109 
109 

:io 
no 
no 



CO GLU 
OE1 GLU 
OS 2 GLU 



24.238 
22.774 
22.612 
21.598 
22.055 
23.202 
21.125 
21.425 
20.399 
19.255 
21.605 
20.359 
19.565 
20. 178 
20.826 
19.966 
19.763 
20. 678 
20.547 
20. 619 
19.952 
21.373 
20.038 
21.481 
20.814 
20.970 
18. S33 
18. 194 
17.619 
16.704 
17.217 
17.860 
18.528 
18.205 
17.774 
17.463 
18.043 
18.847 
20.064 
19.123 
16.560 
16.212 
15.939 
15.239 
15.069 
14. 767 
13.400 
12.821 
12.968 
13.630 
12.432 
16.577 
16.377 
16.346 
16. 829 
17.465 
15. 770 
15.741 
16.438 
16.086 
14.303 
13.744 
12.247 



46.900 
45. 619 
45.211 
44. 967 
43 .703 
42.620 
42.9'11 
43.840 
44.366 
43.601 
45 . 674 
41.365 
40.219 
39.543 
39.404 
39.123 
39. 398 
40.458 
38.524 
40. 632 
38.692 
39.751 
39.931 
39. 115 
39.349 
37.037 
37.010 
39.063 
39.631 
40.974 
35.951 
34.658 
33 . 696 
33.734 
34.034 
33.791 
34.968 
32.804 
31.751 
20.498 
30. 509 
32. 100 
30.995 
31. 160 
29.854 
29.244 
29.815 
28.041 
29 .414 
28. 207 
26.979 
26.965 
23.059 
25. 939 
24.655 
23-678 
23.545 
24. 122 
24.242 
24 .280 
2 2 . S 4 3 



34.688 
36.283 
38.646 
39.498 
40. 180 
40.085 
40.872 
41.510 
41. 181 
40.824 
43.001 
43.697 
44.259 
43.659 
41.328 
41.156 
42.475 
43.281 
40.246 
38.793 
38. 173 
38.006 



36.025 
34.670 
42.709 
43.897 
43.397 
42.562 
44.823 
46.060 
45.793 
43.835 
43.352 
44.468 
45.582 
42.410 
43. 137 
41.264 
44. 154 
45.048 
44.2S4 
43.249 
4S.959 
46.932 
47.610 
47.883 
49.035 
50.046 
49.195 
44.635 
43.870 
44.734 
45.869 
42.S22 
44. 175 
44.823 
43.925 
42 . 771 
44.993 
46.299 
46.372 
45.432 
47.250 
44 . 457 



1.00 19.05 
1.00 23.89 
1.00 20.17 
1.00 20.22 
1.00 24.68 
1.00 18.06 
1.00 15.71 
I. 00 8.89 
1.00 21.85 
1.00 15.17 
1.00 8.58 
1.00 43.57 
1.00 36.67 
1.00 36.47 
1.00 16.80 
1.00 13.90 
1.00 11.05 
1.00 13.86 
1.00 15.88 
1.00 15.57 
1.00 13.14 
1.00 13.35 
1.00 13.44 
1.00 10.87 
1.00 15.93 
1.00 17.32 
1.00 12.39 
1.00 11.51 
1.00 17.25 
1.00 13.14 
1.00 14.82 
1.00 40.71 
1.00 43.48 
1.00 14.95 
1.00 11.97 
1.00 15.81 
1.00 13.68 
1.00 23.81 
1-00 13.88 
i.00 13.04 
1.00 13.57 
1.00 12.56 
1.00 13.07 
1.00 12.52 
1.00 17.32 
1.00 17.92 
1.00 19.99 
1.00 36.05 
1.00 55.71 
1.00 44.11 
1.00 94.34 
1.00 13.26 
1.00 12.68 
1.00 13.15 
1.00 16.75 
1.00 17.31 
1.00 15.39 
1.00 15.24 
i.00 12.08 
1.00 15.70 
1.00 19.20 
1.00 23.62 
I. 00 60.99 
1.00 76.05 
1.00 54.87 
1,00 10.73 
1.00 10.98 
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ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
. ATOM 
•ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
.-.TOM 



ass 

869 
870 
871 
872 
873 
874 
875 
876 
877 
878 
879 
880 
881 
882 
883 
884 
885 
886 



890 
891 
892 
893 
894 
895 
896 
897 
898 
899 
900 
901 
902 
303 
904 
905 
906 



909 
910 
911 
912 
913 
914 
915 
916 
917 
918 
919 
920 
921 
922 
923 
924 
925 
926 
927 
328 
929 
930 
931 



C VAL 

O VAL 

C8 VAX 

CGI VAL 

CC2 VAL 

N LYS 

CA LYS 

C LYS 

O LYS 

CB LYS 

CG LYS 

CD LYS 

CE LYS 

NZ LYS 

N PHE 

CA PHE 

C PHE 

O PHE 

CB PHE 

CG PHE 

CD1 PHE 

CD2 PHE 

CE1 PHE 

CE2 PHE 

CZ PHE 

:; glu 

CA GLU 

C GLU 

O GLU 

CB GLU 

CG GLU 

CD GLU 

OE1 GLU 

OE2 GLU 

N GLY 



CA 



GLY 
GLY 
GLY 
N ASP 
CA ASP 
C ASP 
O ASP 
CB ASP 
CG ASP 
OD1 ASP 
OD2 ASP 
N THR 
CA THR 
C THR 
O THR 
CB THR 
OG1 THR 
CG2 THR 
LEU 



CA 



,EU 
C " LEU 
O LEU 
CB LEU 
CG LEU 
CD1 LEU 
CD2 LEU 
M VAL 
CA 



113 
113 
113 
114 
114 
114 
114 
114 
114 
114 
114 



VAL 
VAL 



17.96a 
18.271 
19.428 
19.966 
20.452 
17.415 
17. 175 
16.822 
16.695 
16.032 
14-792 
13.509 
12.526 
12.379 
16.683 
16.325 
14.806 
14.110 
16.866 
18.231 
19.344 
18.403 
20.627 
19.673 
20.780 
14.354 
12.978 
13.121 
13.434 
12.348 
11.856 
10.742 
10.181 
10.460 
13.005 
13.225 
14.727 
15.516 
15.137 
16.572 
17.237 
18.423 
17.055 
16.624 
16.230 
16.805 
16.463 
16.889 
17.186 
16.493 
15.806 
15.552 
16.217 
18.284 
18.679 
18.036 
18. 194 
20.243 
20.845 
20.701 
20.366 
17.230 
16.466 
16.929 
17. 135 
14.939 
14.133 



20.630 
20.438 
22.358 
23 . 704 
21.232 
19.732 
18.421 
17 . 485 



19.084 
18.321 
19.134 
20.518 
16.208 
15.175 
14. 975 
14.878 
13.838 
13.536 
13. 795 
13.009 
13. S00 
12. 708 
12.953 
14. 815 
14.473 
13. 193 
13.207 
15.481 
16.747 
16.460 
15.395 
17.461 
12.087 
10.861 
10.767 
10.922 
10.564 
10.462 
11.677 
11.672 

9.074 

3.677 

9.468 

7.391 
12.729 
13.981 
14.988 
15.064 
14.497 
13.508 
15.793 
15.681 
16.706 
17.992 

;a.368 

IS. 815 33.8 



44.261 
45-432 
43.012 
43.487 
43-078 
43.S16 
44.045 
42.931 
41.808 
45.036 
44.376 
44.703 
45.528 
45.036 
43.267 
42.317 
42.181 
43.160 
42.838 
42.338 
43.139 
41.056 
42 .665 
40.572 
41.387 
^0.966 
40.642 
39.906 
38.730 
39.667 
40.376 
41 .342 
41.431 
42.079 
40.585 
39.869 
39.641 
40.570 
38.439 
38.233 
37.598 
37.265 
37.733 
36.348 
35.495 
36.130 
37.493 
36.910 
37.976 
33.996 
35.952 
34.990 
35.275 
37.805 
38.759 
38.269 
37.091 



. 167 
.311 
. 595 
. 797 
.039 
.039 
. 556 



1.00 3.62 
1.00 15.63 
1.00 22.75 
1.00 16.69 
1.00 18.47 
1.00 14.67 
1.00 16.41 
1.00 7.11 
1.00 16.27 
1.00 22.50 
1.00 20.40 
1.00 44.65 
1.00 54.02 
1. 00100.00 
1.00 10.09 
1.00 11.41 
1.00 14.18 
1.00 15.03 
1.00 12.89 
1.00 16.80 
1.00 18.61 
1.00 19.50 
1 .00 22.78 
1.00 25.35 
1.00 23.99 
1.00 15.29 
1.00 11.40 
1.00 13.30 
1.00 18.72 
1.00 9.68 
1.00 19.54 
1.00 38.12 
1.00 34.84 
.00 27.88 
.00 14.51 



.00 : 



91 



.00 23.: 

.00 19.35 

.00 20.26 

.00 28.00 

.00 22.39 

.00 21.33 

.00 33.06 

.00 55.04 

.00 59.57 

.00 82.48 

.00 19.62 

.00 18.21 

.00 18.92 

.00 15.94 

.00 19.03 

.00 21.42 

.00 15.49 

.00 13.66 

.00 13.50 

.00 8.81 

.00 12.49 

.00 12.25 

1.00 3.90 



39.95! 

39.669 1.00 10.11 
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ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 



940 
'941 
942 
943 
944 
945 
946 
947 
948 
949 
950 
951 
952 
953 
954 
955 
956 
957 
95B 
959 
960 
961 
962 
963 
964 
965 
966 
967 
968 
969 
970 
971 
972 
973 
974 
975 
976 
977 
978 
979 
980 
981 
982 
983 
984 
985 
986 
987 
388 
989 
990 
991 
992 



CA 



CG2 VAL 
:» ASN 
CA ASN 
C ASN 
O ASN 
CB ASN 
CG ASN 
OD1 ASN 
ND2 ASN 
AUG 
ARG 
ARG 
ARG 
ARG 
ARG 
CD ARG 
N ILE 
CA ILE 
C ILE 
O ILE 
CB ILE 
CGI ILE 
CG2 ILE 
CD1 ILE 
N GLU 
CA GLU 
C GLU 
O GLU 
CB GLU 
CG GLU 
N LEU 
CA LEU 
C LEU 
O LEU 
C3 LEU 
CG LEU 
CD1 LEU 
CD2 LEU 
LYS 



CA 



LYS 
LYS 
LYS 



LYS 
LYS 
LYS 
GLY 
GLY 
GLY 
GLY 



CGI ILE 
CG2 ILE 
C01 ILE 
ASP 



-21 
121 
122 
122 
122 
122 
122 
122 
122 
123 



123 
123 
123 
123 
123 
124 
124 
124 
124 
124 
124 
125 



ASP 
ASP 
ASP 



14. 501 
17.067 
17.424 
16.301 
16.195 
18.753 
19.201 
18.773 
20.124 
15.470 
14.348 
14.622 
14.749 
13.068 
12.478 
11.282 
14.663 
15.030 
13.991 
13.370 
16.296 
17.316 
16.944 
17.6S2 
13.953 
13. 139 
14. 168 
14.91S 
12.028 
12.387 
14.183 
15.092 
14.420 
13.563 
15.976 
17.003 
18.302 
16.511 
14.890 
14.391 
15.563 
16.489 
13.611 
12.853 
11.356 
10.652 
11.229 
15.514 
16.551 
16.012 
14.961 
16.706 
16.282 
17.405 
18.562 
15.432 
16.408 
14.272 
15.824 
16.599 



18.351 
22. Ill 
23.405 
24.382 
24.802 
23.928 
25.261 
25.654 
25.938 
24.706 
2S. 610 
26.946 
27.011 
25.025 
23.921 
23 .244 
27.992 
29.340 
30. 450 
30.535 
29.757 
28.585 
30.993 
28.242 
31.358 
32.572 
33.713 
33.797 
32.677 
33.337 
34.550 
35.654 
37.011 
37.267 
35.533 
36.683 
36.083 
37.732 
37.897 
39.260 
40.276 
40.2-46 
39.487 
40.786 
40.601 
41.929 
42.988 
41.127 
42.151 
43.572 
43.908 
44.404 
45.787 
46.789 
46.496 
46.052 
45.888 
45. 120 
46.391 
43.002 
49.124 
49.086 
49.632 
49.407 
=0.077 
5.0.842 
49.740 



38.246 
38.839 
39.400 
39.060 
37.934 
38.791 
39.367 
40.461 
38.670 
40.029 
39.82 5 
40.498 
41.723 
40. 417 
39.589 
40.281 
39.680 
40.095 
39.835 
38.765 
39.292 
39.180 
39.918 
37.743 
40.793 
40.700 
40.811 
41.780 
41.751 
43.089 
39.808 
39.767 
39.722 
38.893 
38.510 
33.375 
37.849 
37.367 
40.554 
40.579 
40.445 
41.246 
41.877 
41.923 
41.675 
41.521 
42.367 
39.411 
39. 121 
39.272 
33.693 
40.070 
40.243 
40. 196 
40.429 
41.504 
42.701 
41.577 
44.013 
39.918 
39.882 
33.801 
33.953 
41.253 
42.226 
41.SS3 
43.475 



1.00 15.35 
1.00 12.24 
1-00 11.73 
1-00 11.18 
1.00 11.09 
1.00 11.41 
1.00 11.07 
1.00 12.06 
1.00 11. SO 
1.00 13.69 
1.00 12.99 
1.00 5.89 
1.00 14.47 
1.00 15.99 
1.00 30.23 
1.00 60.61 
1.00 11.46 
1.00 11.86 
1.00 10.54 
1.00 12.83 
1.00 15.41 
1.00 12.27 
1.00 14. CI 
1.00 7.74 
1.00 11.36 
1.00 15-20 
1.00 11.93 
1.00 15.61 
1.00 19.74 
1.00 72.94 
1.00 12.19 
1.00 15.00 
1.00 19.2 5 
1.00 13.41 
1.00 14.29 
1.00 17.55 
1.00 13.46 
1.00 12.09 
1.00 12.73 



1.00 17.21 
1.00 33.94 
1.00 60.87 
1.00 52.70 
1.00 47.22 
1.00 18.71 
1.00 17.32 
1.00 25.32 
1.00 20.14 
1.00 18.42 
1.00 21.04 
1.00 25.93 
1.00 19.37 
1.00 23.82 " 
1.00 23.86 
1.00 28.95 
1.00 2 9.89 
1.00 20-26 
1.00 12. S3 
1.00 20.36 
1.00 24.23 
1.00 20.57 
1.00-4 3.70 
1 .00 49-42 
1.00 2 3.07 
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ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
•ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 
ATOM 



1000 

:ooi 

1002 

:oo3 

1004 
100S 
1006 
1007 
1008 
1009 
1010 
1011 
1012 
1013 
1014 
1015 
1016 
1017 
1018 
1019 
1020 
1021 
1022 
1023 
1024 
1025 
1026 
1027 
1028 
1029 
1030 
1031 
1032 
1033 
1034 
1035 
1036 
1037 
1038 
1039 
1040 
1041 
1042 
1043 
1044 
1045 
1046 
1047 
1048 
1049 
10S0 
1051 
1052 
1053 
1054 
105 5 
1056 
1057 
1058 
1059 
1060 
1061 
1062 
1063 



PHE 

CA PHE 
C PHE 
O PHE 
C3 PHE 
CG PHE 
CD1 PHE 
CD2 PHE 
CE1 PHE 
CE2 PHE 
CZ PHE 
LYS 
LYS 
LYS 
LYS 
LYS 
LYS 
LYS 
GLU 
GLU 
GLU 
GLU 
GLU 
ASP 
ASP 



CA 



CA 



C ASP 
O ASP 
C3 ASP 
CG ASP 
OD1 ASP 
OD2 ASP 
'.! GLY 
CA GLY 
C GLY 
O CLY 
M ASN 
CA ASN 
C ASN 
O ASN 
C3 ASN 
CG ASN 
OD1 ASN 
MD2 ASN 
:! ILE 
CA ILE 
C ILE 
O ILE 
CB ILE 
CGI ILE 
CG2 ILE 
CD1 ILE 
LEU 
CA LEU 
C LEU 
O LEU 
C3 - LEU 



COl LEU 
CO 2 LEU 

:: gly 



130 
130 
130 
13 0 
130 
130 
130 
131 
131 
131 
131 
131 
131 
131 
132 



132 
133 
133 
133 
133 
133 
133 
133 



134 
134 
134 
135 

135 



18.510 
19.433 
19.330 
18.242 
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